* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-12 21:13 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji35kySP4Q8cUpYCXR2P9DBwPcYeWjr_HX==TNX9CPW9NA@mail.gmail.com>
apologies for the verbosity just adding some more info which is now
making me lose hope. Using parted -l instead of fdisk gives me this:
[root@lamachine ~]# parted -l
Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 32.3kB 31.5GB 31.5GB primary raid
2 31.5GB 294GB 262GB primary ext4 raid
3 294GB 500GB 207GB extended
5 294GB 326GB 32.2GB logical
6 336GB 339GB 3644MB logical raid
Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
2 210MB 262GB 262GB primary raid
3 262GB 271GB 8389MB primary linux-swap(v1)
4 271GB 500GB 229GB extended
5 271GB 303GB 32.2GB logical
6 313GB 317GB 3644MB logical raid
Error: Invalid argument during seek for read on /dev/sdc
Retry/Ignore/Cancel? R
Error: Invalid argument during seek for read on /dev/sdc
Retry/Ignore/Cancel? I
Error: The backup GPT table is corrupt, but the primary appears OK, so
that will be used.
OK/Cancel? O
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdc: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdd: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 2199GB 2199GB raid
2 2199GB 2736GB 537GB
Error: Invalid argument during seek for read on /dev/sde
Retry/Ignore/Cancel? C
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:
Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sdf: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 1049kB 210MB 209MB primary ext4 boot
2 210MB 31.7GB 31.5GB primary raid
3 31.7GB 294GB 262GB primary ext4 raid
4 294GB 500GB 206GB extended
5 294GB 326GB 32.2GB logical
6 336GB 340GB 3644MB logical raid
Model: Linux Software RAID Array (md)
Disk /dev/md2: 524GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:
Number Start End Size File system Flags
1 0.00B 524GB 524GB ext4
Error: /dev/md126: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md126: 31.5GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
Error: /dev/md127: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md127: 96.6GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
On 12 September 2016 at 20:41, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> ok, I just adjusted system time so that I can start tracking logs.
>
> what I'm noticing however is that fdisk -l is not giving me the expect
> partitions (I was expecting at least 2 partitions in every 2.7 disk
> similar to what I have in sdd):
>
> [root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde}
> Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
>
> Device Boot Start End Sectors Size Id Type
> /dev/sdc1 1 4294967295 4294967295 2T ee GPT
>
> Partition 1 does not start on physical sector boundary.
> Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: gpt
> Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9
>
> Device Start End Sectors Size Type
> /dev/sdd1 2048 4294969343 4294967296 2T Linux RAID
> /dev/sdd2 4294969344 5343545343 1048576000 500G Linux filesystem
> Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
>
> Device Boot Start End Sectors Size Id Type
> /dev/sde1 1 4294967295 4294967295 2T ee GPT
>
> Partition 1 does not start on physical sector boundary.
> [root@lamachine lamachine_220315]#
>
> what could've happened here? any ideas why the partition tables ended
> up like that?
>
> From previous information I have an idea of what the md128 and md129
> are supposed to looks like (also noticed that the device names
> changed):
>
> # md128 and md129 details From an old command output
> /dev/md128:
> Version : 1.2
> Creation Time : Fri Oct 24 15:24:38 2014
> Raid Level : raid5
> Array Size : 4294705152 (4095.75 GiB 4397.78 GB)
> Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB)
> Raid Devices : 3
> Total Devices : 3
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Sun Mar 22 06:20:08 2015
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Name : lamachine:128 (local to host lamachine)
> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
> Events : 4041
>
> Number Major Minor RaidDevice State
> 0 8 49 0 active sync /dev/sdd1
> 1 8 65 1 active sync /dev/sde1
> 3 8 81 2 active sync /dev/sdf1
> /dev/md129:
> Version : 1.2
> Creation Time : Mon Nov 10 16:28:11 2014
> Raid Level : raid0
> Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
> Raid Devices : 3
> Total Devices : 3
> Persistence : Superblock is persistent
>
> Update Time : Mon Nov 10 16:28:11 2014
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Chunk Size : 512K
>
> Name : lamachine:129 (local to host lamachine)
> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
> Events : 0
>
> Number Major Minor RaidDevice State
> 0 8 50 0 active sync /dev/sdd2
> 1 8 66 1 active sync /dev/sde2
> 2 8 82 2 active sync /dev/sdf2
>
> Is there any way to recover the contents of these two arrays ? :(
>
> On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> However I'm noticing that the details with this new MB are somewhat different:
>>
>> [root@lamachine ~]# cat /etc/mdadm.conf
>> # mdadm.conf written out by anaconda
>> MAILADDR root
>> AUTO +imsm +1.x -all
>> ARRAY /dev/md2 level=raid5 num-devices=3
>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>> ARRAY /dev/md126 level=raid10 num-devices=2
>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>> ARRAY /dev/md127 level=raid0 num-devices=3
>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>> [root@lamachine ~]# mdadm --detail /dev/md1*
>> /dev/md126:
>> Version : 0.90
>> Creation Time : Thu Dec 3 22:12:12 2009
>> Raid Level : raid10
>> Array Size : 30719936 (29.30 GiB 31.46 GB)
>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>> Raid Devices : 2
>> Total Devices : 2
>> Preferred Minor : 126
>> Persistence : Superblock is persistent
>>
>> Update Time : Tue Jan 12 04:03:41 2016
>> State : clean
>> Active Devices : 2
>> Working Devices : 2
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Layout : near=2
>> Chunk Size : 64K
>>
>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>> Events : 0.264152
>>
>> Number Major Minor RaidDevice State
>> 0 8 82 0 active sync set-A /dev/sdf2
>> 1 8 1 1 active sync set-B /dev/sda1
>> /dev/md127:
>> Version : 1.2
>> Creation Time : Tue Jul 26 19:00:28 2011
>> Raid Level : raid0
>> Array Size : 94367232 (90.00 GiB 96.63 GB)
>> Raid Devices : 3
>> Total Devices : 3
>> Persistence : Superblock is persistent
>>
>> Update Time : Tue Jul 26 19:00:28 2011
>> State : clean
>> Active Devices : 3
>> Working Devices : 3
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Chunk Size : 512K
>>
>> Name : reading.homeunix.com:3
>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>> Events : 0
>>
>> Number Major Minor RaidDevice State
>> 0 8 85 0 active sync /dev/sdf5
>> 1 8 21 1 active sync /dev/sdb5
>> 2 8 5 2 active sync /dev/sda5
>> /dev/md128:
>> Version : 1.2
>> Raid Level : raid0
>> Total Devices : 1
>> Persistence : Superblock is persistent
>>
>> State : inactive
>>
>> Name : lamachine:128 (local to host lamachine)
>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>> Events : 4154
>>
>> Number Major Minor RaidDevice
>>
>> - 8 49 - /dev/sdd1
>> /dev/md129:
>> Version : 1.2
>> Raid Level : raid0
>> Total Devices : 1
>> Persistence : Superblock is persistent
>>
>> State : inactive
>>
>> Name : lamachine:129 (local to host lamachine)
>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>> Events : 0
>>
>> Number Major Minor RaidDevice
>>
>> - 8 50 - /dev/sdd2
>> [root@lamachine ~]# mdadm --detail /dev/md2*
>> /dev/md2:
>> Version : 0.90
>> Creation Time : Mon Feb 11 07:54:36 2013
>> Raid Level : raid5
>> Array Size : 511999872 (488.28 GiB 524.29 GB)
>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>> Raid Devices : 3
>> Total Devices : 3
>> Preferred Minor : 2
>> Persistence : Superblock is persistent
>>
>> Update Time : Tue Jan 12 02:31:50 2016
>> State : clean
>> Active Devices : 3
>> Working Devices : 3
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>> Events : 0.611
>>
>> Number Major Minor RaidDevice State
>> 0 8 83 0 active sync /dev/sdf3
>> 1 8 18 1 active sync /dev/sdb2
>> 2 8 2 2 active sync /dev/sda2
>> [root@lamachine ~]# cat /proc/mdstat
>> Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
>> md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>
>> md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
>> 94367232 blocks super 1.2 512k chunks
>>
>> md129 : inactive sdd2[2](S)
>> 524156928 blocks super 1.2
>>
>> md128 : inactive sdd1[3](S)
>> 2147352576 blocks super 1.2
>>
>> md126 : active raid10 sdf2[0] sda1[1]
>> 30719936 blocks 2 near-copies [2/2] [UU]
>>
>> unused devices: <none>
>> [root@lamachine ~]#
>>
>> On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>> ok, system up and running after MB was replaced however the arrays
>>> remain inactive.
>>>
>>> mdadm version is:
>>> mdadm - v3.3.4 - 3rd August 2015
>>>
>>> Here's the output from Phil's lsdrv:
>>>
>>> [root@lamachine ~]# ./lsdrv
>>> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
>>> chipset 6-Port SATA AHCI Controller (rev 06)
>>> ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379}
>>> │└sda 465.76g [8:0] Partitioned (dos)
>>> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
>>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>>> │ │ │ PV LVM2_member 28.03g used, 1.26g free
>>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>>> │ │ └VG vg_bigblackbox 29.29g 1.26g free
>>> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
>>> │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4
>>> {b08d7f5e-f15f-4241-804e-edccecab6003}
>>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
>>> │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4
>>> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
>>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
>>> │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
>>> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
>>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
>>> │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4
>>> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
>>> │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
>>> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ │ └Mounted as /dev/md2 @ /home
>>> │ ├sda3 1.00k [8:3] Partitioned (dos)
>>> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │ │ PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
>>> │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
>>> │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
>>> │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
>>> │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
>>> │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
>>> │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
>>> │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
>>> │ └sda6 3.39g [8:6] Empty/Unknown
>>> ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185}
>>> │└sdb 465.76g [8:16] Partitioned (dos)
>>> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
>>> │ ├sdb4 1.00k [8:20] Partitioned (dos)
>>> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │ PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ └sdb6 3.39g [8:22] Empty/Unknown
>>> ├scsi 2:x:x:x [Empty]
>>> ├scsi 3:x:x:x [Empty]
>>> ├scsi 4:x:x:x [Empty]
>>> └scsi 5:x:x:x [Empty]
>>> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
>>> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
>>> ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
>>> │└sdc 2.73t [8:32] Partitioned (PMBR)
>>> ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
>>> │└sdd 2.73t [8:48] Partitioned (gpt)
>>> │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128'
>>> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
>>> │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None
>>> {f2372cb9:d3816fd6:ce86d826:882ec82e}
>>> │ │ Empty/Unknown
>>> │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129'
>>> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
>>> │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None
>>> {895dae98:d1a496de:4f590b8b:cb8ac12a}
>>> │ Empty/Unknown
>>> ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906}
>>> │└sde 2.73t [8:64] Partitioned (PMBR)
>>> ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724}
>>> │└sdf 465.76g [8:80] Partitioned (dos)
>>> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
>>> │ │└Mounted as /dev/sdf1 @ /boot
>>> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
>>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>>> │ │ PV LVM2_member 28.03g used, 1.26g free
>>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>>> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ ├sdf4 1.00k [8:84] Partitioned (dos)
>>> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │ PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ └sdf6 3.39g [8:86] Empty/Unknown
>>> ├scsi 10:x:x:x [Empty]
>>> ├scsi 11:x:x:x [Empty]
>>> └scsi 12:x:x:x [Empty]
>>> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
>>> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
>>> └scsi 14:x:x:x [Empty]
>>> [root@lamachine ~]#
>>>
>>> Thanks in advance for any recommendations on what steps to take in
>>> order to bring these arrays back online.
>>>
>>> Regards,
>>>
>>> Daniel
>>>
>>>
>>> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>>> Thanks very much for the response Wol.
>>>>
>>>> It looks like the PSU is dead (server automatically powers off a few
>>>> seconds after power on).
>>>>
>>>> I'm planning to order a PSU replacement to resume troubleshooting so
>>>> please bear with me; maybe the PSU was degraded and couldn't power
>>>> some of drives?
>>>>
>>>> Cheers,
>>>>
>>>> Daniel
>>>>
>>>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>>>> Just a quick first response. I see md128 and md129 are both down, and
>>>>> are both listed as one drive, raid0. Bit odd, that ...
>>>>>
>>>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>>>> that would split an array in two. Is it possible that you should have
>>>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>>>
>>>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>>>
>>>>> How much do you know about what the setup should be, and why it was set
>>>>> up that way?
>>>>>
>>>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>>>> python3 a quick fix to the shebang at the start should get it to work).
>>>>> Post the output from that here.
>>>>>
>>>>> Cheers,
>>>>> Wol
>>>>>
>>>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I have a box that I believe was not powered down correctly and after
>>>>>> transporting it to a different location it doesn't boot anymore
>>>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>>>
>>>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>>>
>>>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>
>>>>>> #
>>>>>> # /etc/fstab
>>>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>>>> #
>>>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>>>> #
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4
>>>>>> defaults 1 1
>>>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4
>>>>>> defaults 1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4
>>>>>> defaults 1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4
>>>>>> defaults 1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4
>>>>>> defaults 1 2
>>>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap
>>>>>> defaults 0 0
>>>>>> /dev/md2 /home ext4 defaults 1 2
>>>>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2
>>>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>>>> [root@lamachine ~]#
>>>>>>
>>>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>>>> inactive, but not sure how to safely activate these so looking for
>>>>>> some knowledgeable advice on how to proceed here.
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> Below some more relevant outputs:
>>>>>>
>>>>>> [root@lamachine ~]# cat /proc/mdstat
>>>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>>>> 94367232 blocks super 1.2 512k chunks
>>>>>>
>>>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>>>
>>>>>> md128 : inactive sdf1[3](S)
>>>>>> 2147352576 blocks super 1.2
>>>>>>
>>>>>> md129 : inactive sdf2[2](S)
>>>>>> 524156928 blocks super 1.2
>>>>>>
>>>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>>>> 30719936 blocks 2 near-copies [2/2] [UU]
>>>>>>
>>>>>> unused devices: <none>
>>>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>>>> # mdadm.conf written out by anaconda
>>>>>> MAILADDR root
>>>>>> AUTO +imsm +1.x -all
>>>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>>>> /dev/md126:
>>>>>> Version : 0.90
>>>>>> Creation Time : Thu Dec 3 22:12:12 2009
>>>>>> Raid Level : raid10
>>>>>> Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>> Raid Devices : 2
>>>>>> Total Devices : 2
>>>>>> Preferred Minor : 126
>>>>>> Persistence : Superblock is persistent
>>>>>>
>>>>>> Update Time : Tue Aug 2 07:46:39 2016
>>>>>> State : clean
>>>>>> Active Devices : 2
>>>>>> Working Devices : 2
>>>>>> Failed Devices : 0
>>>>>> Spare Devices : 0
>>>>>>
>>>>>> Layout : near=2
>>>>>> Chunk Size : 64K
>>>>>>
>>>>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>> Events : 0.264152
>>>>>>
>>>>>> Number Major Minor RaidDevice State
>>>>>> 0 8 2 0 active sync set-A /dev/sda2
>>>>>> 1 8 33 1 active sync set-B /dev/sdc1
>>>>>> /dev/md127:
>>>>>> Version : 1.2
>>>>>> Creation Time : Tue Jul 26 19:00:28 2011
>>>>>> Raid Level : raid0
>>>>>> Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>>>> Raid Devices : 3
>>>>>> Total Devices : 3
>>>>>> Persistence : Superblock is persistent
>>>>>>
>>>>>> Update Time : Tue Jul 26 19:00:28 2011
>>>>>> State : clean
>>>>>> Active Devices : 3
>>>>>> Working Devices : 3
>>>>>> Failed Devices : 0
>>>>>> Spare Devices : 0
>>>>>>
>>>>>> Chunk Size : 512K
>>>>>>
>>>>>> Name : reading.homeunix.com:3
>>>>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>> Events : 0
>>>>>>
>>>>>> Number Major Minor RaidDevice State
>>>>>> 0 8 5 0 active sync /dev/sda5
>>>>>> 1 8 21 1 active sync /dev/sdb5
>>>>>> 2 8 37 2 active sync /dev/sdc5
>>>>>> /dev/md128:
>>>>>> Version : 1.2
>>>>>> Raid Level : raid0
>>>>>> Total Devices : 1
>>>>>> Persistence : Superblock is persistent
>>>>>>
>>>>>> State : inactive
>>>>>>
>>>>>> Name : lamachine:128 (local to host lamachine)
>>>>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>> Events : 4154
>>>>>>
>>>>>> Number Major Minor RaidDevice
>>>>>>
>>>>>> - 8 81 - /dev/sdf1
>>>>>> /dev/md129:
>>>>>> Version : 1.2
>>>>>> Raid Level : raid0
>>>>>> Total Devices : 1
>>>>>> Persistence : Superblock is persistent
>>>>>>
>>>>>> State : inactive
>>>>>>
>>>>>> Name : lamachine:129 (local to host lamachine)
>>>>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>> Events : 0
>>>>>>
>>>>>> Number Major Minor RaidDevice
>>>>>>
>>>>>> - 8 82 - /dev/sdf2
>>>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>>>> /dev/md2:
>>>>>> Version : 0.90
>>>>>> Creation Time : Mon Feb 11 07:54:36 2013
>>>>>> Raid Level : raid5
>>>>>> Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>>>> Raid Devices : 3
>>>>>> Total Devices : 3
>>>>>> Preferred Minor : 2
>>>>>> Persistence : Superblock is persistent
>>>>>>
>>>>>> Update Time : Mon Aug 1 20:24:23 2016
>>>>>> State : clean
>>>>>> Active Devices : 3
>>>>>> Working Devices : 3
>>>>>> Failed Devices : 0
>>>>>> Spare Devices : 0
>>>>>>
>>>>>> Layout : left-symmetric
>>>>>> Chunk Size : 64K
>>>>>>
>>>>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>>>> Events : 0.611
>>>>>>
>>>>>> Number Major Minor RaidDevice State
>>>>>> 0 8 3 0 active sync /dev/sda3
>>>>>> 1 8 18 1 active sync /dev/sdb2
>>>>>> 2 8 34 2 active sync /dev/sdc2
>>>>>> [root@lamachine ~]#
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
^ permalink raw reply
* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-12 19:41 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji3HdHaJAvwtYNU=Ykc_qohBkfFfrbP0M=FNhuFMK+d-Jg@mail.gmail.com>
ok, I just adjusted system time so that I can start tracking logs.
what I'm noticing however is that fdisk -l is not giving me the expect
partitions (I was expecting at least 2 partitions in every 2.7 disk
similar to what I have in sdd):
[root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde}
Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000
Device Boot Start End Sectors Size Id Type
/dev/sdc1 1 4294967295 4294967295 2T ee GPT
Partition 1 does not start on physical sector boundary.
Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9
Device Start End Sectors Size Type
/dev/sdd1 2048 4294969343 4294967296 2T Linux RAID
/dev/sdd2 4294969344 5343545343 1048576000 500G Linux filesystem
Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000
Device Boot Start End Sectors Size Id Type
/dev/sde1 1 4294967295 4294967295 2T ee GPT
Partition 1 does not start on physical sector boundary.
[root@lamachine lamachine_220315]#
what could've happened here? any ideas why the partition tables ended
up like that?
From previous information I have an idea of what the md128 and md129
are supposed to looks like (also noticed that the device names
changed):
# md128 and md129 details From an old command output
/dev/md128:
Version : 1.2
Creation Time : Fri Oct 24 15:24:38 2014
Raid Level : raid5
Array Size : 4294705152 (4095.75 GiB 4397.78 GB)
Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Mar 22 06:20:08 2015
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : lamachine:128 (local to host lamachine)
UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
Events : 4041
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
1 8 65 1 active sync /dev/sde1
3 8 81 2 active sync /dev/sdf1
/dev/md129:
Version : 1.2
Creation Time : Mon Nov 10 16:28:11 2014
Raid Level : raid0
Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Nov 10 16:28:11 2014
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : lamachine:129 (local to host lamachine)
UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
Events : 0
Number Major Minor RaidDevice State
0 8 50 0 active sync /dev/sdd2
1 8 66 1 active sync /dev/sde2
2 8 82 2 active sync /dev/sdf2
Is there any way to recover the contents of these two arrays ? :(
On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> However I'm noticing that the details with this new MB are somewhat different:
>
> [root@lamachine ~]# cat /etc/mdadm.conf
> # mdadm.conf written out by anaconda
> MAILADDR root
> AUTO +imsm +1.x -all
> ARRAY /dev/md2 level=raid5 num-devices=3
> UUID=2cff15d1:e411447b:fd5d4721:03e44022
> ARRAY /dev/md126 level=raid10 num-devices=2
> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
> ARRAY /dev/md127 level=raid0 num-devices=3
> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
> [root@lamachine ~]# mdadm --detail /dev/md1*
> /dev/md126:
> Version : 0.90
> Creation Time : Thu Dec 3 22:12:12 2009
> Raid Level : raid10
> Array Size : 30719936 (29.30 GiB 31.46 GB)
> Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 126
> Persistence : Superblock is persistent
>
> Update Time : Tue Jan 12 04:03:41 2016
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : near=2
> Chunk Size : 64K
>
> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
> Events : 0.264152
>
> Number Major Minor RaidDevice State
> 0 8 82 0 active sync set-A /dev/sdf2
> 1 8 1 1 active sync set-B /dev/sda1
> /dev/md127:
> Version : 1.2
> Creation Time : Tue Jul 26 19:00:28 2011
> Raid Level : raid0
> Array Size : 94367232 (90.00 GiB 96.63 GB)
> Raid Devices : 3
> Total Devices : 3
> Persistence : Superblock is persistent
>
> Update Time : Tue Jul 26 19:00:28 2011
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Chunk Size : 512K
>
> Name : reading.homeunix.com:3
> UUID : acd5374f:72628c93:6a906c4b:5f675ce5
> Events : 0
>
> Number Major Minor RaidDevice State
> 0 8 85 0 active sync /dev/sdf5
> 1 8 21 1 active sync /dev/sdb5
> 2 8 5 2 active sync /dev/sda5
> /dev/md128:
> Version : 1.2
> Raid Level : raid0
> Total Devices : 1
> Persistence : Superblock is persistent
>
> State : inactive
>
> Name : lamachine:128 (local to host lamachine)
> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
> Events : 4154
>
> Number Major Minor RaidDevice
>
> - 8 49 - /dev/sdd1
> /dev/md129:
> Version : 1.2
> Raid Level : raid0
> Total Devices : 1
> Persistence : Superblock is persistent
>
> State : inactive
>
> Name : lamachine:129 (local to host lamachine)
> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
> Events : 0
>
> Number Major Minor RaidDevice
>
> - 8 50 - /dev/sdd2
> [root@lamachine ~]# mdadm --detail /dev/md2*
> /dev/md2:
> Version : 0.90
> Creation Time : Mon Feb 11 07:54:36 2013
> Raid Level : raid5
> Array Size : 511999872 (488.28 GiB 524.29 GB)
> Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 2
> Persistence : Superblock is persistent
>
> Update Time : Tue Jan 12 02:31:50 2016
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
> Events : 0.611
>
> Number Major Minor RaidDevice State
> 0 8 83 0 active sync /dev/sdf3
> 1 8 18 1 active sync /dev/sdb2
> 2 8 2 2 active sync /dev/sda2
> [root@lamachine ~]# cat /proc/mdstat
> Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
> md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
> 94367232 blocks super 1.2 512k chunks
>
> md129 : inactive sdd2[2](S)
> 524156928 blocks super 1.2
>
> md128 : inactive sdd1[3](S)
> 2147352576 blocks super 1.2
>
> md126 : active raid10 sdf2[0] sda1[1]
> 30719936 blocks 2 near-copies [2/2] [UU]
>
> unused devices: <none>
> [root@lamachine ~]#
>
> On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> ok, system up and running after MB was replaced however the arrays
>> remain inactive.
>>
>> mdadm version is:
>> mdadm - v3.3.4 - 3rd August 2015
>>
>> Here's the output from Phil's lsdrv:
>>
>> [root@lamachine ~]# ./lsdrv
>> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
>> chipset 6-Port SATA AHCI Controller (rev 06)
>> ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379}
>> │└sda 465.76g [8:0] Partitioned (dos)
>> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>> │ │ │ PV LVM2_member 28.03g used, 1.26g free
>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>> │ │ └VG vg_bigblackbox 29.29g 1.26g free
>> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
>> │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4
>> {b08d7f5e-f15f-4241-804e-edccecab6003}
>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
>> │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4
>> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
>> │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
>> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
>> │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4
>> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
>> │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
>> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
>> {2cff15d1-e411-447b-fd5d-472103e44022}
>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>> {2cff15d1:e411447b:fd5d4721:03e44022}
>> │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>> │ │ └Mounted as /dev/md2 @ /home
>> │ ├sda3 1.00k [8:3] Partitioned (dos)
>> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>> │ │ │ PV LVM2_member 86.00g used, 3.99g free
>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
>> │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
>> │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
>> │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
>> │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
>> │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
>> │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
>> │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
>> │ └sda6 3.39g [8:6] Empty/Unknown
>> ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185}
>> │└sdb 465.76g [8:16] Partitioned (dos)
>> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
>> {2cff15d1-e411-447b-fd5d-472103e44022}
>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>> {2cff15d1:e411447b:fd5d4721:03e44022}
>> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
>> │ ├sdb4 1.00k [8:20] Partitioned (dos)
>> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>> │ │ PV LVM2_member 86.00g used, 3.99g free
>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>> │ └sdb6 3.39g [8:22] Empty/Unknown
>> ├scsi 2:x:x:x [Empty]
>> ├scsi 3:x:x:x [Empty]
>> ├scsi 4:x:x:x [Empty]
>> └scsi 5:x:x:x [Empty]
>> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
>> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
>> ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
>> │└sdc 2.73t [8:32] Partitioned (PMBR)
>> ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
>> │└sdd 2.73t [8:48] Partitioned (gpt)
>> │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128'
>> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
>> │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None
>> {f2372cb9:d3816fd6:ce86d826:882ec82e}
>> │ │ Empty/Unknown
>> │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129'
>> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
>> │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None
>> {895dae98:d1a496de:4f590b8b:cb8ac12a}
>> │ Empty/Unknown
>> ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906}
>> │└sde 2.73t [8:64] Partitioned (PMBR)
>> ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724}
>> │└sdf 465.76g [8:80] Partitioned (dos)
>> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
>> │ │└Mounted as /dev/sdf1 @ /boot
>> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>> │ │ PV LVM2_member 28.03g used, 1.26g free
>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
>> {2cff15d1-e411-447b-fd5d-472103e44022}
>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>> {2cff15d1:e411447b:fd5d4721:03e44022}
>> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>> │ ├sdf4 1.00k [8:84] Partitioned (dos)
>> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>> │ │ PV LVM2_member 86.00g used, 3.99g free
>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>> │ └sdf6 3.39g [8:86] Empty/Unknown
>> ├scsi 10:x:x:x [Empty]
>> ├scsi 11:x:x:x [Empty]
>> └scsi 12:x:x:x [Empty]
>> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
>> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
>> └scsi 14:x:x:x [Empty]
>> [root@lamachine ~]#
>>
>> Thanks in advance for any recommendations on what steps to take in
>> order to bring these arrays back online.
>>
>> Regards,
>>
>> Daniel
>>
>>
>> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>> Thanks very much for the response Wol.
>>>
>>> It looks like the PSU is dead (server automatically powers off a few
>>> seconds after power on).
>>>
>>> I'm planning to order a PSU replacement to resume troubleshooting so
>>> please bear with me; maybe the PSU was degraded and couldn't power
>>> some of drives?
>>>
>>> Cheers,
>>>
>>> Daniel
>>>
>>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>>> Just a quick first response. I see md128 and md129 are both down, and
>>>> are both listed as one drive, raid0. Bit odd, that ...
>>>>
>>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>>> that would split an array in two. Is it possible that you should have
>>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>>
>>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>>
>>>> How much do you know about what the setup should be, and why it was set
>>>> up that way?
>>>>
>>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>>> python3 a quick fix to the shebang at the start should get it to work).
>>>> Post the output from that here.
>>>>
>>>> Cheers,
>>>> Wol
>>>>
>>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>>> Hi All,
>>>>>
>>>>> I have a box that I believe was not powered down correctly and after
>>>>> transporting it to a different location it doesn't boot anymore
>>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>>
>>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>>
>>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>
>>>>> #
>>>>> # /etc/fstab
>>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>>> #
>>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>>> #
>>>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4
>>>>> defaults 1 1
>>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4
>>>>> defaults 1 2
>>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4
>>>>> defaults 1 2
>>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4
>>>>> defaults 1 2
>>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4
>>>>> defaults 1 2
>>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap
>>>>> defaults 0 0
>>>>> /dev/md2 /home ext4 defaults 1 2
>>>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2
>>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>>> [root@lamachine ~]#
>>>>>
>>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>>> inactive, but not sure how to safely activate these so looking for
>>>>> some knowledgeable advice on how to proceed here.
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Daniel
>>>>>
>>>>> Below some more relevant outputs:
>>>>>
>>>>> [root@lamachine ~]# cat /proc/mdstat
>>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>>> 94367232 blocks super 1.2 512k chunks
>>>>>
>>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>>
>>>>> md128 : inactive sdf1[3](S)
>>>>> 2147352576 blocks super 1.2
>>>>>
>>>>> md129 : inactive sdf2[2](S)
>>>>> 524156928 blocks super 1.2
>>>>>
>>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>>> 30719936 blocks 2 near-copies [2/2] [UU]
>>>>>
>>>>> unused devices: <none>
>>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>>> # mdadm.conf written out by anaconda
>>>>> MAILADDR root
>>>>> AUTO +imsm +1.x -all
>>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>>> /dev/md126:
>>>>> Version : 0.90
>>>>> Creation Time : Thu Dec 3 22:12:12 2009
>>>>> Raid Level : raid10
>>>>> Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>>> Raid Devices : 2
>>>>> Total Devices : 2
>>>>> Preferred Minor : 126
>>>>> Persistence : Superblock is persistent
>>>>>
>>>>> Update Time : Tue Aug 2 07:46:39 2016
>>>>> State : clean
>>>>> Active Devices : 2
>>>>> Working Devices : 2
>>>>> Failed Devices : 0
>>>>> Spare Devices : 0
>>>>>
>>>>> Layout : near=2
>>>>> Chunk Size : 64K
>>>>>
>>>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>>> Events : 0.264152
>>>>>
>>>>> Number Major Minor RaidDevice State
>>>>> 0 8 2 0 active sync set-A /dev/sda2
>>>>> 1 8 33 1 active sync set-B /dev/sdc1
>>>>> /dev/md127:
>>>>> Version : 1.2
>>>>> Creation Time : Tue Jul 26 19:00:28 2011
>>>>> Raid Level : raid0
>>>>> Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>>> Raid Devices : 3
>>>>> Total Devices : 3
>>>>> Persistence : Superblock is persistent
>>>>>
>>>>> Update Time : Tue Jul 26 19:00:28 2011
>>>>> State : clean
>>>>> Active Devices : 3
>>>>> Working Devices : 3
>>>>> Failed Devices : 0
>>>>> Spare Devices : 0
>>>>>
>>>>> Chunk Size : 512K
>>>>>
>>>>> Name : reading.homeunix.com:3
>>>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>>> Events : 0
>>>>>
>>>>> Number Major Minor RaidDevice State
>>>>> 0 8 5 0 active sync /dev/sda5
>>>>> 1 8 21 1 active sync /dev/sdb5
>>>>> 2 8 37 2 active sync /dev/sdc5
>>>>> /dev/md128:
>>>>> Version : 1.2
>>>>> Raid Level : raid0
>>>>> Total Devices : 1
>>>>> Persistence : Superblock is persistent
>>>>>
>>>>> State : inactive
>>>>>
>>>>> Name : lamachine:128 (local to host lamachine)
>>>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>> Events : 4154
>>>>>
>>>>> Number Major Minor RaidDevice
>>>>>
>>>>> - 8 81 - /dev/sdf1
>>>>> /dev/md129:
>>>>> Version : 1.2
>>>>> Raid Level : raid0
>>>>> Total Devices : 1
>>>>> Persistence : Superblock is persistent
>>>>>
>>>>> State : inactive
>>>>>
>>>>> Name : lamachine:129 (local to host lamachine)
>>>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>> Events : 0
>>>>>
>>>>> Number Major Minor RaidDevice
>>>>>
>>>>> - 8 82 - /dev/sdf2
>>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>>> /dev/md2:
>>>>> Version : 0.90
>>>>> Creation Time : Mon Feb 11 07:54:36 2013
>>>>> Raid Level : raid5
>>>>> Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>>> Raid Devices : 3
>>>>> Total Devices : 3
>>>>> Preferred Minor : 2
>>>>> Persistence : Superblock is persistent
>>>>>
>>>>> Update Time : Mon Aug 1 20:24:23 2016
>>>>> State : clean
>>>>> Active Devices : 3
>>>>> Working Devices : 3
>>>>> Failed Devices : 0
>>>>> Spare Devices : 0
>>>>>
>>>>> Layout : left-symmetric
>>>>> Chunk Size : 64K
>>>>>
>>>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>>> Events : 0.611
>>>>>
>>>>> Number Major Minor RaidDevice State
>>>>> 0 8 3 0 active sync /dev/sda3
>>>>> 1 8 18 1 active sync /dev/sdb2
>>>>> 2 8 34 2 active sync /dev/sdc2
>>>>> [root@lamachine ~]#
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
^ permalink raw reply
* Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").
From: Christophe JAILLET @ 2016-09-12 19:09 UTC (permalink / raw)
To: shli, linux-raid, linux-kernel
Hi,
I'm puzzled by commit f9a67b1182e5 ("md/bitmap: clear bitmap if
bitmap_create failed").
Part of the commit is:
@@ -1865,8 +1866,10 @@ int bitmap_copy_from_slot(struct mddev *mddev,
int slot,
struct bitmap_counts *counts;
struct bitmap *bitmap = bitmap_create(mddev, slot);
- if (IS_ERR(bitmap))
+ if (IS_ERR(bitmap)) {
+ bitmap_free(bitmap);
return PTR_ERR(bitmap);
+ }
but if 'bitmap' is an error, I think that bad things will happen in
'bitmap_free()' when, at the beginning of the function, we will execute:
if (bitmap->sysfs_can_clear) <-----------------
sysfs_put(bitmap->sysfs_can_clear);
However, the commit log message is really explicit and adding this call
to 'bitmap_free' has really been done one purpose. ("If bitmap_create
returns an error, we need to call either bitmap_destroy or bitmap_free
to do clean up, ...")
It is also not consistent with the comment before function bitmap_create():
* if this returns an error, bitmap_destroy must be called to do
clean up
* once mddev->bitmap is set
I may have missed something, but I don't see what.
Is this commit correct?
Best regards,
CJ
^ permalink raw reply
* Re: [PATCH v3] mdadm: fix a buffer overflow
From: Jes Sorensen @ 2016-09-12 16:51 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, shli
In-Reply-To: <1473358867-4114379-1-git-send-email-songliubraving@fb.com>
Song Liu <songliubraving@fb.com> writes:
> struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
> is 33B long. So we need strncpy instead strcpy to avoid buffer
> overflow.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> super1.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Applied thanks!
Note there is at least one place with a str operation hardcoding the
length of set_name to 32. Would you mind fixing that too?
Cheers,
Jes
^ permalink raw reply
* Re: dm: Return correct value in retry loop
From: Mike Snitzer @ 2016-09-12 15:01 UTC (permalink / raw)
To: Minfei Huang; +Cc: agk, shli, linux-raid, dm-devel, linux-kernel
In-Reply-To: <20160912013906.GA24268@MinfeideMacBook-Pro.local>
Thanks for the patch, I've picked it up as a stable@ fix for either
4.8-rc7 or when the 4.9 merge windw opens (I'm leaning toward the latter
since this issue has been around since 3.19 was released and there
aren't any known problems/reports related to this oversight).
Please see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=7735f936a13d79bbaead1723e4532e65d4c4cf01
On Sun, Sep 11 2016 at 9:39pm -0400,
Minfei Huang <mnghuan@gmail.com> wrote:
> Ping.
>
> Any comment is appreciate.
>
> Thanks
> Minfei
>
> On 09/06/16 at 04:00P, Minfei Huang wrote:
> > dm_resume will return sliently in retry loop's failure. Assign a correct
> > return value in the failed loop.
> >
> > Remove a useless assignment as well.
> >
> > Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> > ---
> > drivers/md/dm.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index fa9b1cb..c935cc8 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -2249,10 +2249,11 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
> >
> > int dm_resume(struct mapped_device *md)
> > {
> > - int r = -EINVAL;
> > + int r;
> > struct dm_table *map = NULL;
> >
> > retry:
> > + r = -EINVAL;
> > mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
> >
> > if (!dm_suspended_md(md))
> > @@ -2277,10 +2278,8 @@ retry:
> >
> > clear_bit(DMF_SUSPENDED, &md->flags);
> >
> > - r = 0;
> > out:
> > mutex_unlock(&md->suspend_lock);
> > -
> > return r;
> > }
> >
> > --
> > 2.7.4 (Apple Git-66)
> >
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply
* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Artur Paszkiewicz @ 2016-09-12 10:58 UTC (permalink / raw)
To: Yi Zhang, Shaohua Li; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <a543dd7a-b582-98fa-3ba8-67500396c766@redhat.com>
On 09/12/2016 10:03 AM, Yi Zhang wrote:
> Hello Artur
> With your patch, no "md: export_rdev(sde)" printed after create raid10.
>
> I found another problem, not sure whether it is reasonable, could you help confirm it, thanks.
> When I create one container with 4 disks[1], and create one raid10 with 3 disks(sd[b-d]) + 1 missing [2], but it finally bind the fourth disk: sde [3].
>
> [1] mdadm -CR /dev/md0 /dev/sd[b-e] -n4 -e imsm
> [2] mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing --size=500M
> [3] # cat /proc/mdstat
> Personalities : [raid10]
> md127 : active raid10 sde[4] sdd[2] sdc[1] sdb[0]
> 1024000 blocks super external:/md0/0 128K chunks 2 near-copies [4/4] [UUUU]
>
> md0 : inactive sde[3](S) sdd[2](S) sdc[1](S) sdb[0](S)
> 4420 blocks super external:imsm
>
> unused devices: <none>
I think that this is correct behavior. Because there is a spare disk
available in the container, it is used for rebuilding the volume. This
is equivalent to:
mdadm -CR /dev/md0 /dev/sd[b-d] -n3 -e imsm
mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing --size=500M
mdadm -a /dev/md0 /dev/sde
^ permalink raw reply
* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Yi Zhang @ 2016-09-12 8:03 UTC (permalink / raw)
To: Artur Paszkiewicz, Shaohua Li; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <7910bc85-f9c4-1ea3-76a6-40b819738537@intel.com>
On 09/09/2016 08:56 PM, Artur Paszkiewicz wrote:
> On 09/09/2016 12:56 AM, Shaohua Li wrote:
>> On Wed, Sep 07, 2016 at 02:43:41AM -0400, Yi Zhang wrote:
>>> Hello
>>>
>>> I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
>>>
>>> Steps I used:
>>> mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
>>> mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
>>>
>>> Version:
>>> 4.8.0-rc5
>>> mdadm - v3.4-84-gbd1fd72 - 25th August 2016
>> can't reproduce with old mdadm but can with upstream mdadm. Looks mdadm is
>> keeping write the new_dev sysfs entry.
>>
>> Jes, any idea?
>>
>> Thanks,
>> Shaohua
>>> Log:
>>> http://pastebin.com/FJJwvgg6
>>>
>>> <6>[ 301.102007] md: bind<sdb>
>>> <6>[ 301.102095] md: bind<sdc>
>>> <6>[ 301.102159] md: bind<sdd>
>>> <6>[ 301.102215] md: bind<sde>
>>> <6>[ 301.102291] md: bind<sdf>
>>> <6>[ 301.103010] ata3.00: Enabling discard_zeroes_data
>>> <6>[ 311.714344] ata3.00: Enabling discard_zeroes_data
>>> <6>[ 311.721866] md: bind<sdb>
>>> <6>[ 311.721965] md: bind<sdc>
>>> <6>[ 311.722029] md: bind<sdd>
>>> <5>[ 311.733165] md/raid10:md127: not clean -- starting background reconstruction
>>> <6>[ 311.733167] md/raid10:md127: active with 3 out of 4 devices
>>> <6>[ 311.733186] md127: detected capacity change from 0 to 240060989440
>>> <6>[ 311.774027] md: bind<sde>
>>> <6>[ 311.810664] md: md127 switched to read-write mode.
>>> <6>[ 311.819885] md: resync of RAID array md127
>>> <6>[ 311.819886] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
>>> <6>[ 311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
>>> <6>[ 311.819891] md: using 128k window, over a total of 234435328k.
>>> <6>[ 316.606073] ata3.00: Enabling discard_zeroes_data
>>> <6>[ 343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
>>> <6>[ 1482.314944] md: md127: resync done.
>>> <7>[ 1482.315086] RAID10 conf printout:
>>> <7>[ 1482.315087] --- wd:3 rd:4
>>> <7>[ 1482.315089] disk 0, wo:0, o:1, dev:sdb
>>> <7>[ 1482.315089] disk 1, wo:0, o:1, dev:sdc
>>> <7>[ 1482.315090] disk 2, wo:0, o:1, dev:sdd
>>> <7>[ 1482.315099] RAID10 conf printout:
>>> <7>[ 1482.315099] --- wd:3 rd:4
>>> <7>[ 1482.315100] disk 0, wo:0, o:1, dev:sdb
>>> <7>[ 1482.315100] disk 1, wo:0, o:1, dev:sdc
>>> <7>[ 1482.315101] disk 2, wo:0, o:1, dev:sdd
>>> <7>[ 1482.315101] disk 3, wo:1, o:1, dev:sde
>>> <6>[ 1482.315220] md: recovery of RAID array md127
>>> <6>[ 1482.315221] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
>>> <6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
>>> <6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
>>> <6>[ 2697.184217] md: md127: recovery done.
>>> <7>[ 2697.524143] RAID10 conf printout:
>>> <7>[ 2697.524144] --- wd:4 rd:4
>>> <7>[ 2697.524146] disk 0, wo:0, o:1, dev:sdb
>>> <7>[ 2697.524146] disk 1, wo:0, o:1, dev:sdc
>>> <7>[ 2697.524147] disk 2, wo:0, o:1, dev:sdd
>>> <7>[ 2697.524148] disk 3, wo:0, o:1, dev:sde
>>> <6>[ 2697.524632] md: export_rdev(sde)
>>> <6>[ 2697.549452] md: export_rdev(sde)
>>> <6>[ 2697.568763] md: export_rdev(sde)
>>> <6>[ 2697.587938] md: export_rdev(sde)
>>> <6>[ 2697.607271] md: export_rdev(sdeautomate)
>>> <6>[ 2697.626321] md: export_rdev(sdeautomateautomate)
>>> <6>[ 2697.645676] md: export_rdev(sde)
>>> <6>[ 2697.663211] md: export_rdev(sde)
>>> <6>[ 2697.681603] md: export_rdev(sde)
>>> <6>[ 2697.699117] md: export_rdev(sde)
>>> <6>[ 2697.716510] md: export_rdev(sde)
>>>
>>> Best Regards,
>>> Yi Zhang
> Can you check if this fix works for you? If it does I'll send a proper
> patch for this.
Hello Artur
With your patch, no "md: export_rdev(sde)" printed after create raid10.
I found another problem, not sure whether it is reasonable, could you
help confirm it, thanks.
When I create one container with 4 disks[1], and create one raid10 with
3 disks(sd[b-d]) + 1 missing [2], but it finally bind the fourth disk:
sde [3].
[1] mdadm -CR /dev/md0 /dev/sd[b-e] -n4 -e imsm
[2] mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing --size=500M
[3] # cat /proc/mdstat
Personalities : [raid10]
md127 : active raid10 sde[4] sdd[2] sdc[1] sdb[0]
1024000 blocks super external:/md0/0 128K chunks 2 near-copies
[4/4] [UUUU]
md0 : inactive sde[3](S) sdd[2](S) sdc[1](S) sdb[0](S)
4420 blocks super external:imsm
unused devices: <none>
> Thanks,
> Artur
>
> diff --git a/super-intel.c b/super-intel.c
> index 92817e9..ffa71f6 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -7789,6 +7789,9 @@ static struct mdinfo *imsm_activate_spare(struct active_array *a,
> IMSM_T_STATE_DEGRADED)
> return NULL;
>
> + if (get_imsm_map(dev, MAP_0)->map_state == IMSM_T_STATE_UNINITIALIZED)
> + return NULL;
> +
> /*
> * If there are any failed disks check state of the other volume.
> * Block rebuild if the another one is failed until failed disks
^ permalink raw reply
* Re: [PATCH] dm: Return correct value in retry loop
From: Minfei Huang @ 2016-09-12 1:39 UTC (permalink / raw)
To: agk, snitzer, shli; +Cc: dm-devel, linux-raid, linux-kernel
In-Reply-To: <1473148829-3317-1-git-send-email-mnghuan@gmail.com>
Ping.
Any comment is appreciate.
Thanks
Minfei
On 09/06/16 at 04:00P, Minfei Huang wrote:
> dm_resume will return sliently in retry loop's failure. Assign a correct
> return value in the failed loop.
>
> Remove a useless assignment as well.
>
> Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> ---
> drivers/md/dm.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index fa9b1cb..c935cc8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2249,10 +2249,11 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
>
> int dm_resume(struct mapped_device *md)
> {
> - int r = -EINVAL;
> + int r;
> struct dm_table *map = NULL;
>
> retry:
> + r = -EINVAL;
> mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
>
> if (!dm_suspended_md(md))
> @@ -2277,10 +2278,8 @@ retry:
>
> clear_bit(DMF_SUSPENDED, &md->flags);
>
> - r = 0;
> out:
> mutex_unlock(&md->suspend_lock);
> -
> return r;
> }
>
> --
> 2.7.4 (Apple Git-66)
>
^ permalink raw reply
* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-11 20:06 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji3C4Cygs=xh4d4PtREp9mGSBjNS0o7SatW_QotzYShA_Q@mail.gmail.com>
However I'm noticing that the details with this new MB are somewhat different:
[root@lamachine ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md2 level=raid5 num-devices=3
UUID=2cff15d1:e411447b:fd5d4721:03e44022
ARRAY /dev/md126 level=raid10 num-devices=2
UUID=9af006ca:8845bbd3:bfe78010:bc810f04
ARRAY /dev/md127 level=raid0 num-devices=3
UUID=acd5374f:72628c93:6a906c4b:5f675ce5
ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
ARRAY /dev/md129 metadata=1.2 name=lamachine:129
UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
[root@lamachine ~]# mdadm --detail /dev/md1*
/dev/md126:
Version : 0.90
Creation Time : Thu Dec 3 22:12:12 2009
Raid Level : raid10
Array Size : 30719936 (29.30 GiB 31.46 GB)
Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 126
Persistence : Superblock is persistent
Update Time : Tue Jan 12 04:03:41 2016
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 64K
UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
Events : 0.264152
Number Major Minor RaidDevice State
0 8 82 0 active sync set-A /dev/sdf2
1 8 1 1 active sync set-B /dev/sda1
/dev/md127:
Version : 1.2
Creation Time : Tue Jul 26 19:00:28 2011
Raid Level : raid0
Array Size : 94367232 (90.00 GiB 96.63 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Tue Jul 26 19:00:28 2011
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : reading.homeunix.com:3
UUID : acd5374f:72628c93:6a906c4b:5f675ce5
Events : 0
Number Major Minor RaidDevice State
0 8 85 0 active sync /dev/sdf5
1 8 21 1 active sync /dev/sdb5
2 8 5 2 active sync /dev/sda5
/dev/md128:
Version : 1.2
Raid Level : raid0
Total Devices : 1
Persistence : Superblock is persistent
State : inactive
Name : lamachine:128 (local to host lamachine)
UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
Events : 4154
Number Major Minor RaidDevice
- 8 49 - /dev/sdd1
/dev/md129:
Version : 1.2
Raid Level : raid0
Total Devices : 1
Persistence : Superblock is persistent
State : inactive
Name : lamachine:129 (local to host lamachine)
UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
Events : 0
Number Major Minor RaidDevice
- 8 50 - /dev/sdd2
[root@lamachine ~]# mdadm --detail /dev/md2*
/dev/md2:
Version : 0.90
Creation Time : Mon Feb 11 07:54:36 2013
Raid Level : raid5
Array Size : 511999872 (488.28 GiB 524.29 GB)
Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Tue Jan 12 02:31:50 2016
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
Events : 0.611
Number Major Minor RaidDevice State
0 8 83 0 active sync /dev/sdf3
1 8 18 1 active sync /dev/sdb2
2 8 2 2 active sync /dev/sda2
[root@lamachine ~]# cat /proc/mdstat
Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
94367232 blocks super 1.2 512k chunks
md129 : inactive sdd2[2](S)
524156928 blocks super 1.2
md128 : inactive sdd1[3](S)
2147352576 blocks super 1.2
md126 : active raid10 sdf2[0] sda1[1]
30719936 blocks 2 near-copies [2/2] [UU]
unused devices: <none>
[root@lamachine ~]#
On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> ok, system up and running after MB was replaced however the arrays
> remain inactive.
>
> mdadm version is:
> mdadm - v3.3.4 - 3rd August 2015
>
> Here's the output from Phil's lsdrv:
>
> [root@lamachine ~]# ./lsdrv
> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
> chipset 6-Port SATA AHCI Controller (rev 06)
> ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379}
> │└sda 465.76g [8:0] Partitioned (dos)
> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
> {9af006ca:8845bbd3:bfe78010:bc810f04}
> │ │ │ PV LVM2_member 28.03g used, 1.26g free
> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
> │ │ └VG vg_bigblackbox 29.29g 1.26g free
> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
> │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4
> {b08d7f5e-f15f-4241-804e-edccecab6003}
> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
> │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4
> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
> │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
> │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4
> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
> │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
> {2cff15d1-e411-447b-fd5d-472103e44022}
> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
> {2cff15d1:e411447b:fd5d4721:03e44022}
> │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
> │ │ └Mounted as /dev/md2 @ /home
> │ ├sda3 1.00k [8:3] Partitioned (dos)
> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
> │ │ │ PV LVM2_member 86.00g used, 3.99g free
> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
> │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
> │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
> │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
> │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
> │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
> │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
> │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
> │ └sda6 3.39g [8:6] Empty/Unknown
> ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185}
> │└sdb 465.76g [8:16] Partitioned (dos)
> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
> {2cff15d1-e411-447b-fd5d-472103e44022}
> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
> {2cff15d1:e411447b:fd5d4721:03e44022}
> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
> │ ├sdb4 1.00k [8:20] Partitioned (dos)
> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
> │ │ PV LVM2_member 86.00g used, 3.99g free
> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
> │ └sdb6 3.39g [8:22] Empty/Unknown
> ├scsi 2:x:x:x [Empty]
> ├scsi 3:x:x:x [Empty]
> ├scsi 4:x:x:x [Empty]
> └scsi 5:x:x:x [Empty]
> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
> ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
> │└sdc 2.73t [8:32] Partitioned (PMBR)
> ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
> │└sdd 2.73t [8:48] Partitioned (gpt)
> │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128'
> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
> │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None
> {f2372cb9:d3816fd6:ce86d826:882ec82e}
> │ │ Empty/Unknown
> │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129'
> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
> │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None
> {895dae98:d1a496de:4f590b8b:cb8ac12a}
> │ Empty/Unknown
> ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906}
> │└sde 2.73t [8:64] Partitioned (PMBR)
> ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724}
> │└sdf 465.76g [8:80] Partitioned (dos)
> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
> │ │└Mounted as /dev/sdf1 @ /boot
> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
> {9af006ca:8845bbd3:bfe78010:bc810f04}
> │ │ PV LVM2_member 28.03g used, 1.26g free
> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
> {2cff15d1-e411-447b-fd5d-472103e44022}
> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
> {2cff15d1:e411447b:fd5d4721:03e44022}
> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
> │ ├sdf4 1.00k [8:84] Partitioned (dos)
> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
> │ │ PV LVM2_member 86.00g used, 3.99g free
> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
> │ └sdf6 3.39g [8:86] Empty/Unknown
> ├scsi 10:x:x:x [Empty]
> ├scsi 11:x:x:x [Empty]
> └scsi 12:x:x:x [Empty]
> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
> └scsi 14:x:x:x [Empty]
> [root@lamachine ~]#
>
> Thanks in advance for any recommendations on what steps to take in
> order to bring these arrays back online.
>
> Regards,
>
> Daniel
>
>
> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> Thanks very much for the response Wol.
>>
>> It looks like the PSU is dead (server automatically powers off a few
>> seconds after power on).
>>
>> I'm planning to order a PSU replacement to resume troubleshooting so
>> please bear with me; maybe the PSU was degraded and couldn't power
>> some of drives?
>>
>> Cheers,
>>
>> Daniel
>>
>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>> Just a quick first response. I see md128 and md129 are both down, and
>>> are both listed as one drive, raid0. Bit odd, that ...
>>>
>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>> that would split an array in two. Is it possible that you should have
>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>
>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>
>>> How much do you know about what the setup should be, and why it was set
>>> up that way?
>>>
>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>> python3 a quick fix to the shebang at the start should get it to work).
>>> Post the output from that here.
>>>
>>> Cheers,
>>> Wol
>>>
>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>> Hi All,
>>>>
>>>> I have a box that I believe was not powered down correctly and after
>>>> transporting it to a different location it doesn't boot anymore
>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>
>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>
>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> #
>>>> # /etc/fstab
>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>> #
>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>> #
>>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4
>>>> defaults 1 1
>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4
>>>> defaults 1 2
>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4
>>>> defaults 1 2
>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4
>>>> defaults 1 2
>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4
>>>> defaults 1 2
>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap
>>>> defaults 0 0
>>>> /dev/md2 /home ext4 defaults 1 2
>>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2
>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>> [root@lamachine ~]#
>>>>
>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>> inactive, but not sure how to safely activate these so looking for
>>>> some knowledgeable advice on how to proceed here.
>>>>
>>>> Thanks in advance,
>>>>
>>>> Daniel
>>>>
>>>> Below some more relevant outputs:
>>>>
>>>> [root@lamachine ~]# cat /proc/mdstat
>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>> 94367232 blocks super 1.2 512k chunks
>>>>
>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>
>>>> md128 : inactive sdf1[3](S)
>>>> 2147352576 blocks super 1.2
>>>>
>>>> md129 : inactive sdf2[2](S)
>>>> 524156928 blocks super 1.2
>>>>
>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>> 30719936 blocks 2 near-copies [2/2] [UU]
>>>>
>>>> unused devices: <none>
>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>> # mdadm.conf written out by anaconda
>>>> MAILADDR root
>>>> AUTO +imsm +1.x -all
>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>> /dev/md126:
>>>> Version : 0.90
>>>> Creation Time : Thu Dec 3 22:12:12 2009
>>>> Raid Level : raid10
>>>> Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>> Raid Devices : 2
>>>> Total Devices : 2
>>>> Preferred Minor : 126
>>>> Persistence : Superblock is persistent
>>>>
>>>> Update Time : Tue Aug 2 07:46:39 2016
>>>> State : clean
>>>> Active Devices : 2
>>>> Working Devices : 2
>>>> Failed Devices : 0
>>>> Spare Devices : 0
>>>>
>>>> Layout : near=2
>>>> Chunk Size : 64K
>>>>
>>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>> Events : 0.264152
>>>>
>>>> Number Major Minor RaidDevice State
>>>> 0 8 2 0 active sync set-A /dev/sda2
>>>> 1 8 33 1 active sync set-B /dev/sdc1
>>>> /dev/md127:
>>>> Version : 1.2
>>>> Creation Time : Tue Jul 26 19:00:28 2011
>>>> Raid Level : raid0
>>>> Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>> Raid Devices : 3
>>>> Total Devices : 3
>>>> Persistence : Superblock is persistent
>>>>
>>>> Update Time : Tue Jul 26 19:00:28 2011
>>>> State : clean
>>>> Active Devices : 3
>>>> Working Devices : 3
>>>> Failed Devices : 0
>>>> Spare Devices : 0
>>>>
>>>> Chunk Size : 512K
>>>>
>>>> Name : reading.homeunix.com:3
>>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>> Events : 0
>>>>
>>>> Number Major Minor RaidDevice State
>>>> 0 8 5 0 active sync /dev/sda5
>>>> 1 8 21 1 active sync /dev/sdb5
>>>> 2 8 37 2 active sync /dev/sdc5
>>>> /dev/md128:
>>>> Version : 1.2
>>>> Raid Level : raid0
>>>> Total Devices : 1
>>>> Persistence : Superblock is persistent
>>>>
>>>> State : inactive
>>>>
>>>> Name : lamachine:128 (local to host lamachine)
>>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>> Events : 4154
>>>>
>>>> Number Major Minor RaidDevice
>>>>
>>>> - 8 81 - /dev/sdf1
>>>> /dev/md129:
>>>> Version : 1.2
>>>> Raid Level : raid0
>>>> Total Devices : 1
>>>> Persistence : Superblock is persistent
>>>>
>>>> State : inactive
>>>>
>>>> Name : lamachine:129 (local to host lamachine)
>>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>> Events : 0
>>>>
>>>> Number Major Minor RaidDevice
>>>>
>>>> - 8 82 - /dev/sdf2
>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>> /dev/md2:
>>>> Version : 0.90
>>>> Creation Time : Mon Feb 11 07:54:36 2013
>>>> Raid Level : raid5
>>>> Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>> Raid Devices : 3
>>>> Total Devices : 3
>>>> Preferred Minor : 2
>>>> Persistence : Superblock is persistent
>>>>
>>>> Update Time : Mon Aug 1 20:24:23 2016
>>>> State : clean
>>>> Active Devices : 3
>>>> Working Devices : 3
>>>> Failed Devices : 0
>>>> Spare Devices : 0
>>>>
>>>> Layout : left-symmetric
>>>> Chunk Size : 64K
>>>>
>>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>> Events : 0.611
>>>>
>>>> Number Major Minor RaidDevice State
>>>> 0 8 3 0 active sync /dev/sda3
>>>> 1 8 18 1 active sync /dev/sdb2
>>>> 2 8 34 2 active sync /dev/sdc2
>>>> [root@lamachine ~]#
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
^ permalink raw reply
* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-11 18:48 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji0BQuyKQbLYZ9Ah16hHday_seTaajm8LOn6+HRFqinTyQ@mail.gmail.com>
ok, system up and running after MB was replaced however the arrays
remain inactive.
mdadm version is:
mdadm - v3.3.4 - 3rd August 2015
Here's the output from Phil's lsdrv:
[root@lamachine ~]# ./lsdrv
PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
chipset 6-Port SATA AHCI Controller (rev 06)
├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379}
│└sda 465.76g [8:0] Partitioned (dos)
│ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
{9af006ca-8845-bbd3-bfe7-8010bc810f04}
│ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
{9af006ca:8845bbd3:bfe78010:bc810f04}
│ │ │ PV LVM2_member 28.03g used, 1.26g free
{cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
│ │ └VG vg_bigblackbox 29.29g 1.26g free
{VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
│ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4
{b08d7f5e-f15f-4241-804e-edccecab6003}
│ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
│ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4
{4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
│ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
│ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
{f6b46363-170b-4038-83bd-2c5f9f6a1973}
│ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
│ │ └dm-1 8.50g [253:1] LV LogVol_var ext4
{ab165c61-3d62-4c55-8639-6c2c2bf4b021}
│ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
│ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
{2cff15d1-e411-447b-fd5d-472103e44022}
│ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
{2cff15d1:e411447b:fd5d4721:03e44022}
│ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
│ │ └Mounted as /dev/md2 @ /home
│ ├sda3 1.00k [8:3] Partitioned (dos)
│ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
│ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
(None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
│ │ │ PV LVM2_member 86.00g used, 3.99g free
{VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
│ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
│ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
│ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
│ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
│ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
│ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
│ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
│ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
│ └sda6 3.39g [8:6] Empty/Unknown
├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185}
│└sdb 465.76g [8:16] Partitioned (dos)
│ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
{2cff15d1-e411-447b-fd5d-472103e44022}
│ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
{2cff15d1:e411447b:fd5d4721:03e44022}
│ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
│ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
│ ├sdb4 1.00k [8:20] Partitioned (dos)
│ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
│ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
(None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
│ │ PV LVM2_member 86.00g used, 3.99g free
{VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
│ └sdb6 3.39g [8:22] Empty/Unknown
├scsi 2:x:x:x [Empty]
├scsi 3:x:x:x [Empty]
├scsi 4:x:x:x [Empty]
└scsi 5:x:x:x [Empty]
PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
│└sdc 2.73t [8:32] Partitioned (PMBR)
├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
│└sdd 2.73t [8:48] Partitioned (gpt)
│ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128'
{f2372cb9-d381-6fd6-ce86-d826882ec82e}
│ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None
{f2372cb9:d3816fd6:ce86d826:882ec82e}
│ │ Empty/Unknown
│ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129'
{895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
│ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None
{895dae98:d1a496de:4f590b8b:cb8ac12a}
│ Empty/Unknown
├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906}
│└sde 2.73t [8:64] Partitioned (PMBR)
├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724}
│└sdf 465.76g [8:80] Partitioned (dos)
│ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
│ │└Mounted as /dev/sdf1 @ /boot
│ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
{9af006ca-8845-bbd3-bfe7-8010bc810f04}
│ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
{9af006ca:8845bbd3:bfe78010:bc810f04}
│ │ PV LVM2_member 28.03g used, 1.26g free
{cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
│ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
{2cff15d1-e411-447b-fd5d-472103e44022}
│ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
{2cff15d1:e411447b:fd5d4721:03e44022}
│ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
│ ├sdf4 1.00k [8:84] Partitioned (dos)
│ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
│ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
(None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
│ │ PV LVM2_member 86.00g used, 3.99g free
{VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
│ └sdf6 3.39g [8:86] Empty/Unknown
├scsi 10:x:x:x [Empty]
├scsi 11:x:x:x [Empty]
└scsi 12:x:x:x [Empty]
PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
C602 chipset 4-Port SATA Storage Control Unit (rev 06)
└scsi 14:x:x:x [Empty]
[root@lamachine ~]#
Thanks in advance for any recommendations on what steps to take in
order to bring these arrays back online.
Regards,
Daniel
On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> Thanks very much for the response Wol.
>
> It looks like the PSU is dead (server automatically powers off a few
> seconds after power on).
>
> I'm planning to order a PSU replacement to resume troubleshooting so
> please bear with me; maybe the PSU was degraded and couldn't power
> some of drives?
>
> Cheers,
>
> Daniel
>
> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>> Just a quick first response. I see md128 and md129 are both down, and
>> are both listed as one drive, raid0. Bit odd, that ...
>>
>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>> that would split an array in two. Is it possible that you should have
>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>
>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>
>> How much do you know about what the setup should be, and why it was set
>> up that way?
>>
>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>> python3 a quick fix to the shebang at the start should get it to work).
>> Post the output from that here.
>>
>> Cheers,
>> Wol
>>
>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>> Hi All,
>>>
>>> I have a box that I believe was not powered down correctly and after
>>> transporting it to a different location it doesn't boot anymore
>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>
>>> The box have 6 drives and after instructing the BIOS to boot from the
>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>
>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> #
>>> # /etc/fstab
>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>> #
>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>> #
>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4
>>> defaults 1 1
>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4
>>> defaults 1 2
>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4
>>> defaults 1 2
>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4
>>> defaults 1 2
>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4
>>> defaults 1 2
>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap
>>> defaults 0 0
>>> /dev/md2 /home ext4 defaults 1 2
>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2
>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>> [root@lamachine ~]#
>>>
>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>> inactive, but not sure how to safely activate these so looking for
>>> some knowledgeable advice on how to proceed here.
>>>
>>> Thanks in advance,
>>>
>>> Daniel
>>>
>>> Below some more relevant outputs:
>>>
>>> [root@lamachine ~]# cat /proc/mdstat
>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>> 94367232 blocks super 1.2 512k chunks
>>>
>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>
>>> md128 : inactive sdf1[3](S)
>>> 2147352576 blocks super 1.2
>>>
>>> md129 : inactive sdf2[2](S)
>>> 524156928 blocks super 1.2
>>>
>>> md126 : active raid10 sda2[0] sdc1[1]
>>> 30719936 blocks 2 near-copies [2/2] [UU]
>>>
>>> unused devices: <none>
>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>> # mdadm.conf written out by anaconda
>>> MAILADDR root
>>> AUTO +imsm +1.x -all
>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>> /dev/md126:
>>> Version : 0.90
>>> Creation Time : Thu Dec 3 22:12:12 2009
>>> Raid Level : raid10
>>> Array Size : 30719936 (29.30 GiB 31.46 GB)
>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>> Raid Devices : 2
>>> Total Devices : 2
>>> Preferred Minor : 126
>>> Persistence : Superblock is persistent
>>>
>>> Update Time : Tue Aug 2 07:46:39 2016
>>> State : clean
>>> Active Devices : 2
>>> Working Devices : 2
>>> Failed Devices : 0
>>> Spare Devices : 0
>>>
>>> Layout : near=2
>>> Chunk Size : 64K
>>>
>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>> Events : 0.264152
>>>
>>> Number Major Minor RaidDevice State
>>> 0 8 2 0 active sync set-A /dev/sda2
>>> 1 8 33 1 active sync set-B /dev/sdc1
>>> /dev/md127:
>>> Version : 1.2
>>> Creation Time : Tue Jul 26 19:00:28 2011
>>> Raid Level : raid0
>>> Array Size : 94367232 (90.00 GiB 96.63 GB)
>>> Raid Devices : 3
>>> Total Devices : 3
>>> Persistence : Superblock is persistent
>>>
>>> Update Time : Tue Jul 26 19:00:28 2011
>>> State : clean
>>> Active Devices : 3
>>> Working Devices : 3
>>> Failed Devices : 0
>>> Spare Devices : 0
>>>
>>> Chunk Size : 512K
>>>
>>> Name : reading.homeunix.com:3
>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>> Events : 0
>>>
>>> Number Major Minor RaidDevice State
>>> 0 8 5 0 active sync /dev/sda5
>>> 1 8 21 1 active sync /dev/sdb5
>>> 2 8 37 2 active sync /dev/sdc5
>>> /dev/md128:
>>> Version : 1.2
>>> Raid Level : raid0
>>> Total Devices : 1
>>> Persistence : Superblock is persistent
>>>
>>> State : inactive
>>>
>>> Name : lamachine:128 (local to host lamachine)
>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>> Events : 4154
>>>
>>> Number Major Minor RaidDevice
>>>
>>> - 8 81 - /dev/sdf1
>>> /dev/md129:
>>> Version : 1.2
>>> Raid Level : raid0
>>> Total Devices : 1
>>> Persistence : Superblock is persistent
>>>
>>> State : inactive
>>>
>>> Name : lamachine:129 (local to host lamachine)
>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>> Events : 0
>>>
>>> Number Major Minor RaidDevice
>>>
>>> - 8 82 - /dev/sdf2
>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>> /dev/md2:
>>> Version : 0.90
>>> Creation Time : Mon Feb 11 07:54:36 2013
>>> Raid Level : raid5
>>> Array Size : 511999872 (488.28 GiB 524.29 GB)
>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>> Raid Devices : 3
>>> Total Devices : 3
>>> Preferred Minor : 2
>>> Persistence : Superblock is persistent
>>>
>>> Update Time : Mon Aug 1 20:24:23 2016
>>> State : clean
>>> Active Devices : 3
>>> Working Devices : 3
>>> Failed Devices : 0
>>> Spare Devices : 0
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>> Events : 0.611
>>>
>>> Number Major Minor RaidDevice State
>>> 0 8 3 0 active sync /dev/sda3
>>> 1 8 18 1 active sync /dev/sdb2
>>> 2 8 34 2 active sync /dev/sdc2
>>> [root@lamachine ~]#
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
^ permalink raw reply
* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Artur Paszkiewicz @ 2016-09-09 12:56 UTC (permalink / raw)
To: Shaohua Li, Yi Zhang; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <20160908225607.GA66921@kernel.org>
On 09/09/2016 12:56 AM, Shaohua Li wrote:
> On Wed, Sep 07, 2016 at 02:43:41AM -0400, Yi Zhang wrote:
>> Hello
>>
>> I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
>>
>> Steps I used:
>> mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
>> mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
>>
>> Version:
>> 4.8.0-rc5
>> mdadm - v3.4-84-gbd1fd72 - 25th August 2016
>
> can't reproduce with old mdadm but can with upstream mdadm. Looks mdadm is
> keeping write the new_dev sysfs entry.
>
> Jes, any idea?
>
> Thanks,
> Shaohua
>> Log:
>> http://pastebin.com/FJJwvgg6
>>
>> <6>[ 301.102007] md: bind<sdb>
>> <6>[ 301.102095] md: bind<sdc>
>> <6>[ 301.102159] md: bind<sdd>
>> <6>[ 301.102215] md: bind<sde>
>> <6>[ 301.102291] md: bind<sdf>
>> <6>[ 301.103010] ata3.00: Enabling discard_zeroes_data
>> <6>[ 311.714344] ata3.00: Enabling discard_zeroes_data
>> <6>[ 311.721866] md: bind<sdb>
>> <6>[ 311.721965] md: bind<sdc>
>> <6>[ 311.722029] md: bind<sdd>
>> <5>[ 311.733165] md/raid10:md127: not clean -- starting background reconstruction
>> <6>[ 311.733167] md/raid10:md127: active with 3 out of 4 devices
>> <6>[ 311.733186] md127: detected capacity change from 0 to 240060989440
>> <6>[ 311.774027] md: bind<sde>
>> <6>[ 311.810664] md: md127 switched to read-write mode.
>> <6>[ 311.819885] md: resync of RAID array md127
>> <6>[ 311.819886] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
>> <6>[ 311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
>> <6>[ 311.819891] md: using 128k window, over a total of 234435328k.
>> <6>[ 316.606073] ata3.00: Enabling discard_zeroes_data
>> <6>[ 343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
>> <6>[ 1482.314944] md: md127: resync done.
>> <7>[ 1482.315086] RAID10 conf printout:
>> <7>[ 1482.315087] --- wd:3 rd:4
>> <7>[ 1482.315089] disk 0, wo:0, o:1, dev:sdb
>> <7>[ 1482.315089] disk 1, wo:0, o:1, dev:sdc
>> <7>[ 1482.315090] disk 2, wo:0, o:1, dev:sdd
>> <7>[ 1482.315099] RAID10 conf printout:
>> <7>[ 1482.315099] --- wd:3 rd:4
>> <7>[ 1482.315100] disk 0, wo:0, o:1, dev:sdb
>> <7>[ 1482.315100] disk 1, wo:0, o:1, dev:sdc
>> <7>[ 1482.315101] disk 2, wo:0, o:1, dev:sdd
>> <7>[ 1482.315101] disk 3, wo:1, o:1, dev:sde
>> <6>[ 1482.315220] md: recovery of RAID array md127
>> <6>[ 1482.315221] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
>> <6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
>> <6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
>> <6>[ 2697.184217] md: md127: recovery done.
>> <7>[ 2697.524143] RAID10 conf printout:
>> <7>[ 2697.524144] --- wd:4 rd:4
>> <7>[ 2697.524146] disk 0, wo:0, o:1, dev:sdb
>> <7>[ 2697.524146] disk 1, wo:0, o:1, dev:sdc
>> <7>[ 2697.524147] disk 2, wo:0, o:1, dev:sdd
>> <7>[ 2697.524148] disk 3, wo:0, o:1, dev:sde
>> <6>[ 2697.524632] md: export_rdev(sde)
>> <6>[ 2697.549452] md: export_rdev(sde)
>> <6>[ 2697.568763] md: export_rdev(sde)
>> <6>[ 2697.587938] md: export_rdev(sde)
>> <6>[ 2697.607271] md: export_rdev(sde)
>> <6>[ 2697.626321] md: export_rdev(sde)
>> <6>[ 2697.645676] md: export_rdev(sde)
>> <6>[ 2697.663211] md: export_rdev(sde)
>> <6>[ 2697.681603] md: export_rdev(sde)
>> <6>[ 2697.699117] md: export_rdev(sde)
>> <6>[ 2697.716510] md: export_rdev(sde)
>>
>> Best Regards,
>> Yi Zhang
Can you check if this fix works for you? If it does I'll send a proper
patch for this.
Thanks,
Artur
diff --git a/super-intel.c b/super-intel.c
index 92817e9..ffa71f6 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7789,6 +7789,9 @@ static struct mdinfo *imsm_activate_spare(struct active_array *a,
IMSM_T_STATE_DEGRADED)
return NULL;
+ if (get_imsm_map(dev, MAP_0)->map_state == IMSM_T_STATE_UNINITIALIZED)
+ return NULL;
+
/*
* If there are any failed disks check state of the other volume.
* Block rebuild if the another one is failed until failed disks
^ permalink raw reply related
* Re: lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Shaohua Li @ 2016-09-08 22:56 UTC (permalink / raw)
To: Yi Zhang; +Cc: linux-raid, Jes.Sorensen
In-Reply-To: <1648084319.7702644.1473230621059.JavaMail.zimbra@redhat.com>
On Wed, Sep 07, 2016 at 02:43:41AM -0400, Yi Zhang wrote:
> Hello
>
> I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
>
> Steps I used:
> mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
> mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
>
> Version:
> 4.8.0-rc5
> mdadm - v3.4-84-gbd1fd72 - 25th August 2016
can't reproduce with old mdadm but can with upstream mdadm. Looks mdadm is
keeping write the new_dev sysfs entry.
Jes, any idea?
Thanks,
Shaohua
> Log:
> http://pastebin.com/FJJwvgg6
>
> <6>[ 301.102007] md: bind<sdb>
> <6>[ 301.102095] md: bind<sdc>
> <6>[ 301.102159] md: bind<sdd>
> <6>[ 301.102215] md: bind<sde>
> <6>[ 301.102291] md: bind<sdf>
> <6>[ 301.103010] ata3.00: Enabling discard_zeroes_data
> <6>[ 311.714344] ata3.00: Enabling discard_zeroes_data
> <6>[ 311.721866] md: bind<sdb>
> <6>[ 311.721965] md: bind<sdc>
> <6>[ 311.722029] md: bind<sdd>
> <5>[ 311.733165] md/raid10:md127: not clean -- starting background reconstruction
> <6>[ 311.733167] md/raid10:md127: active with 3 out of 4 devices
> <6>[ 311.733186] md127: detected capacity change from 0 to 240060989440
> <6>[ 311.774027] md: bind<sde>
> <6>[ 311.810664] md: md127 switched to read-write mode.
> <6>[ 311.819885] md: resync of RAID array md127
> <6>[ 311.819886] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> <6>[ 311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
> <6>[ 311.819891] md: using 128k window, over a total of 234435328k.
> <6>[ 316.606073] ata3.00: Enabling discard_zeroes_data
> <6>[ 343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
> <6>[ 1482.314944] md: md127: resync done.
> <7>[ 1482.315086] RAID10 conf printout:
> <7>[ 1482.315087] --- wd:3 rd:4
> <7>[ 1482.315089] disk 0, wo:0, o:1, dev:sdb
> <7>[ 1482.315089] disk 1, wo:0, o:1, dev:sdc
> <7>[ 1482.315090] disk 2, wo:0, o:1, dev:sdd
> <7>[ 1482.315099] RAID10 conf printout:
> <7>[ 1482.315099] --- wd:3 rd:4
> <7>[ 1482.315100] disk 0, wo:0, o:1, dev:sdb
> <7>[ 1482.315100] disk 1, wo:0, o:1, dev:sdc
> <7>[ 1482.315101] disk 2, wo:0, o:1, dev:sdd
> <7>[ 1482.315101] disk 3, wo:1, o:1, dev:sde
> <6>[ 1482.315220] md: recovery of RAID array md127
> <6>[ 1482.315221] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> <6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> <6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
> <6>[ 2697.184217] md: md127: recovery done.
> <7>[ 2697.524143] RAID10 conf printout:
> <7>[ 2697.524144] --- wd:4 rd:4
> <7>[ 2697.524146] disk 0, wo:0, o:1, dev:sdb
> <7>[ 2697.524146] disk 1, wo:0, o:1, dev:sdc
> <7>[ 2697.524147] disk 2, wo:0, o:1, dev:sdd
> <7>[ 2697.524148] disk 3, wo:0, o:1, dev:sde
> <6>[ 2697.524632] md: export_rdev(sde)
> <6>[ 2697.549452] md: export_rdev(sde)
> <6>[ 2697.568763] md: export_rdev(sde)
> <6>[ 2697.587938] md: export_rdev(sde)
> <6>[ 2697.607271] md: export_rdev(sde)
> <6>[ 2697.626321] md: export_rdev(sde)
> <6>[ 2697.645676] md: export_rdev(sde)
> <6>[ 2697.663211] md: export_rdev(sde)
> <6>[ 2697.681603] md: export_rdev(sde)
> <6>[ 2697.699117] md: export_rdev(sde)
> <6>[ 2697.716510] md: export_rdev(sde)
>
> Best Regards,
> Yi Zhang
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH v3] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 18:21 UTC (permalink / raw)
To: linux-raid; +Cc: Jes.Sorensen, shli, Song Liu
struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
super1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/super1.c b/super1.c
index f3e4023..9f62d23 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
strcat(sb->set_name, ":");
strcat(sb->set_name, info->name);
} else
- strcpy(sb->set_name, info->name);
+ strncpy(sb->set_name, info->name, sizeof(sb->set_name));
} else if (strcmp(update, "devicesize") == 0 &&
__le64_to_cpu(sb->super_offset) <
__le64_to_cpu(sb->data_offset)) {
@@ -1444,7 +1444,7 @@ static int init_super1(struct supertype *st, mdu_array_info_t *info,
strcat(sb->set_name, ":");
strcat(sb->set_name, name);
} else
- strcpy(sb->set_name, name);
+ strncpy(sb->set_name, name, sizeof(sb->set_name));
sb->ctime = __cpu_to_le64((unsigned long long)time(0));
sb->level = __cpu_to_le32(info->level);
--
2.8.0.rc2
^ permalink raw reply related
* Re: [PATCH v2] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 18:20 UTC (permalink / raw)
To: Shaohua Li
Cc: linux-raid@vger.kernel.org, Jes.Sorensen@redhat.com, Shaohua Li
In-Reply-To: <20160908175636.GA21973@kernel.org>
Sounds good. Let me resend.
Thanks,
Song
>> On 9/8/16, 10:56 AM, "Shaohua Li" <shli@kernel.org> wrote:
On Wed, Sep 07, 2016 at 05:43:35PM -0700, Song Liu wrote:
> struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
> is 33B long. So we need strncpy instead strcpy to avoid buffer
> overflow.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> super1.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/super1.c b/super1.c
> index f3e4023..46fed54 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
> strcat(sb->set_name, ":");
> strcat(sb->set_name, info->name);
> } else
> - strcpy(sb->set_name, info->name);
> + strncpy(sb->set_name, info->name, 32);
strncpy(sb->set_name, info->name, sizeof(sb->set_name)); ?
^ permalink raw reply
* Re: [PATCH V3] md-cluster: make md-cluster also can work when compiled into kernel
From: Shaohua Li @ 2016-09-08 18:03 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: linux-raid, v4.1+, NeilBrown
In-Reply-To: <1473041848-28009-1-git-send-email-gqjiang@suse.com>
On Sun, Sep 04, 2016 at 10:17:28PM -0400, Guoqing Jiang wrote:
> The md-cluster is compiled as module by default,
> if it is compiled by built-in way, then we can't
> make md-cluster works.
>
> [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors
> [64782.630528] md-cluster module not found.
> [64782.630530] md127: Could not setup cluster service (-2)
>
> Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions")
> Cc: stable@vger.kernel.org (v4.1+)
> Cc: NeilBrown <neilb@suse.com>
> Reported-by: Marc Smith <marc.smith@mcc.edu>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
> V3 changes:
> 1. add the "!md_cluster_ops" test back
> 2. fix wrong mail info of stable kernel
>
> V2 changes:
> 1. call try_module_get if md_cluster_ops is already set,
> otherwise try_module_get/module_put are unbalanced.
applied, thanks!
^ permalink raw reply
* Re: [PATCH v2] mdadm: fix a buffer overflow
From: Shaohua Li @ 2016-09-08 17:56 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, Jes.Sorensen, shli
In-Reply-To: <1473295415-1859888-1-git-send-email-songliubraving@fb.com>
On Wed, Sep 07, 2016 at 05:43:35PM -0700, Song Liu wrote:
> struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
> is 33B long. So we need strncpy instead strcpy to avoid buffer
> overflow.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> super1.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/super1.c b/super1.c
> index f3e4023..46fed54 100644
> --- a/super1.c
> +++ b/super1.c
> @@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
> strcat(sb->set_name, ":");
> strcat(sb->set_name, info->name);
> } else
> - strcpy(sb->set_name, info->name);
> + strncpy(sb->set_name, info->name, 32);
strncpy(sb->set_name, info->name, sizeof(sb->set_name)); ?
^ permalink raw reply
* [PATCH] raid5: allow arbitrary max_hw_sectors
From: Shaohua Li @ 2016-09-08 17:49 UTC (permalink / raw)
To: linux-raid; +Cc: Kernel-team
raid5 will split bio to proper size internally, there is no point to use
underlayer disk's max_hw_sectors. In my qemu system, without the change,
the raid5 only receives 128k size bio, which reduces the chance of bio
merge sending to underlayer disks.
Signed-off-by: Shaohua Li <shli@fb.com>
---
drivers/md/raid5.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b95c54c..fc0b600 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7066,6 +7066,8 @@ static int raid5_run(struct mddev *mddev)
else
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD,
mddev->queue);
+
+ blk_queue_max_hw_sectors(mddev->queue, UINT_MAX);
}
if (journal_dev) {
--
2.8.0.rc2
^ permalink raw reply related
* [PATCH v2] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 0:43 UTC (permalink / raw)
To: linux-raid; +Cc: Jes.Sorensen, shli, Song Liu
struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
super1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/super1.c b/super1.c
index f3e4023..46fed54 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
strcat(sb->set_name, ":");
strcat(sb->set_name, info->name);
} else
- strcpy(sb->set_name, info->name);
+ strncpy(sb->set_name, info->name, 32);
} else if (strcmp(update, "devicesize") == 0 &&
__le64_to_cpu(sb->super_offset) <
__le64_to_cpu(sb->data_offset)) {
@@ -1444,7 +1444,7 @@ static int init_super1(struct supertype *st, mdu_array_info_t *info,
strcat(sb->set_name, ":");
strcat(sb->set_name, name);
} else
- strcpy(sb->set_name, name);
+ strncpy(sb->set_name, name, 32);
sb->ctime = __cpu_to_le64((unsigned long long)time(0));
sb->level = __cpu_to_le32(info->level);
--
2.8.0.rc2
^ permalink raw reply related
* Re: [PATCH] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 0:39 UTC (permalink / raw)
To: linux-raid@vger.kernel.org; +Cc: Jes.Sorensen@redhat.com, Shaohua Li
In-Reply-To: <1473294509-1828100-1-git-send-email-songliubraving@fb.com>
Actually, there are more of similar code. Let me resend a patch that fix them together.
Thanks,
Song
>> On 9/7/16, 5:28 PM, "Song Liu" <songliubraving@fb.com> wrote:
struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
super1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/super1.c b/super1.c
index f3e4023..942f0d2 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
strcat(sb->set_name, ":");
strcat(sb->set_name, info->name);
} else
- strcpy(sb->set_name, info->name);
+ strncpy(sb->set_name, info->name, 32);
} else if (strcmp(update, "devicesize") == 0 &&
__le64_to_cpu(sb->super_offset) <
__le64_to_cpu(sb->data_offset)) {
--
2.8.0.rc2
^ permalink raw reply
* [PATCH] mdadm: fix a buffer overflow
From: Song Liu @ 2016-09-08 0:28 UTC (permalink / raw)
To: linux-raid; +Cc: Jes.Sorensen, shli, Song Liu
struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
super1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/super1.c b/super1.c
index f3e4023..942f0d2 100644
--- a/super1.c
+++ b/super1.c
@@ -1294,7 +1294,7 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
strcat(sb->set_name, ":");
strcat(sb->set_name, info->name);
} else
- strcpy(sb->set_name, info->name);
+ strncpy(sb->set_name, info->name, 32);
} else if (strcmp(update, "devicesize") == 0 &&
__le64_to_cpu(sb->super_offset) <
__le64_to_cpu(sb->data_offset)) {
--
2.8.0.rc2
^ permalink raw reply related
* lots of "md: export_rdev(sde)" printed after create IMSM RAID10 with missing
From: Yi Zhang @ 2016-09-07 6:43 UTC (permalink / raw)
To: linux-raid; +Cc: shli
In-Reply-To: <338941973.7699634.1473230038475.JavaMail.zimbra@redhat.com>
Hello
I tried create one IMSM RAID10 with missing, found lots of "md: export_rdev(sde)" printed, anyone could help check it?
Steps I used:
mdadm -CR /dev/md0 /dev/sd[b-f] -n5 -e imsm
mdadm -CR /dev/md/Volume0 -l10 -n4 /dev/sd[b-d] missing
Version:
4.8.0-rc5
mdadm - v3.4-84-gbd1fd72 - 25th August 2016
Log:
http://pastebin.com/FJJwvgg6
<6>[ 301.102007] md: bind<sdb>
<6>[ 301.102095] md: bind<sdc>
<6>[ 301.102159] md: bind<sdd>
<6>[ 301.102215] md: bind<sde>
<6>[ 301.102291] md: bind<sdf>
<6>[ 301.103010] ata3.00: Enabling discard_zeroes_data
<6>[ 311.714344] ata3.00: Enabling discard_zeroes_data
<6>[ 311.721866] md: bind<sdb>
<6>[ 311.721965] md: bind<sdc>
<6>[ 311.722029] md: bind<sdd>
<5>[ 311.733165] md/raid10:md127: not clean -- starting background reconstruction
<6>[ 311.733167] md/raid10:md127: active with 3 out of 4 devices
<6>[ 311.733186] md127: detected capacity change from 0 to 240060989440
<6>[ 311.774027] md: bind<sde>
<6>[ 311.810664] md: md127 switched to read-write mode.
<6>[ 311.819885] md: resync of RAID array md127
<6>[ 311.819886] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
<6>[ 311.819887] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
<6>[ 311.819891] md: using 128k window, over a total of 234435328k.
<6>[ 316.606073] ata3.00: Enabling discard_zeroes_data
<6>[ 343.949845] capability: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
<6>[ 1482.314944] md: md127: resync done.
<7>[ 1482.315086] RAID10 conf printout:
<7>[ 1482.315087] --- wd:3 rd:4
<7>[ 1482.315089] disk 0, wo:0, o:1, dev:sdb
<7>[ 1482.315089] disk 1, wo:0, o:1, dev:sdc
<7>[ 1482.315090] disk 2, wo:0, o:1, dev:sdd
<7>[ 1482.315099] RAID10 conf printout:
<7>[ 1482.315099] --- wd:3 rd:4
<7>[ 1482.315100] disk 0, wo:0, o:1, dev:sdb
<7>[ 1482.315100] disk 1, wo:0, o:1, dev:sdc
<7>[ 1482.315101] disk 2, wo:0, o:1, dev:sdd
<7>[ 1482.315101] disk 3, wo:1, o:1, dev:sde
<6>[ 1482.315220] md: recovery of RAID array md127
<6>[ 1482.315221] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
<6>[ 1482.315222] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
<6>[ 1482.315227] md: using 128k window, over a total of 117217664k.
<6>[ 2697.184217] md: md127: recovery done.
<7>[ 2697.524143] RAID10 conf printout:
<7>[ 2697.524144] --- wd:4 rd:4
<7>[ 2697.524146] disk 0, wo:0, o:1, dev:sdb
<7>[ 2697.524146] disk 1, wo:0, o:1, dev:sdc
<7>[ 2697.524147] disk 2, wo:0, o:1, dev:sdd
<7>[ 2697.524148] disk 3, wo:0, o:1, dev:sde
<6>[ 2697.524632] md: export_rdev(sde)
<6>[ 2697.549452] md: export_rdev(sde)
<6>[ 2697.568763] md: export_rdev(sde)
<6>[ 2697.587938] md: export_rdev(sde)
<6>[ 2697.607271] md: export_rdev(sde)
<6>[ 2697.626321] md: export_rdev(sde)
<6>[ 2697.645676] md: export_rdev(sde)
<6>[ 2697.663211] md: export_rdev(sde)
<6>[ 2697.681603] md: export_rdev(sde)
<6>[ 2697.699117] md: export_rdev(sde)
<6>[ 2697.716510] md: export_rdev(sde)
Best Regards,
Yi Zhang
^ permalink raw reply
* Re: a hard lockup in md raid5 sequential write (v4.7-rc7)
From: Coly Li @ 2016-09-06 16:46 UTC (permalink / raw)
To: Shaohua Li; +Cc: linux-raid
In-Reply-To: <20160719233556.GC79792@kernel.org>
在 16/7/20 上午7:35, Shaohua Li 写道:
> On Mon, Jul 18, 2016 at 04:55:04PM +0800, Coly Li wrote:
>> Hi,
>>
>> These days I observe a hard lockup in md raid5. This issue can be easily
>> reproduced in kernel v4.7-rc7 (up to commit:
>> 47ef4ad2684d380dd6d596140fb79395115c3950) by this fio job file:
>>
>> [global]
>> direct=1
>> thread=1
>> [job]
>> filename=/dev/md0
>> blocksize=8m
>> rw=write
>> name=raid5
>> lockmem=1
>> numjobs=40
>> write_bw_log=example
>> group_reporting=1
>> norandommap=1
>> log_avg_msec=0
>> runtime=600.0
>> iodepth=64
>> write_lat_log=example
>>
>> Where md0 is a raid5 target assembled by 3 Memblaze (PBlaze3) PCIe SSDs.
>> This test runs on a dual 10-core processors Dell T7910 machine.
>>
>> From the crash dump, dmesg of the panic by nmi watchdog timeout is,
>>
>> [ 2330.544036] NMI watchdog: Watchdog detected hard LOCKUP on cpu
>> 18.dModules linked in: raid456 async_raid6_recov async_memcpy libcrc32c
>> async_pq async_xor async_tx joydev st memdisk(O) memcon(O) af_packet
>> iscsi_ibft iscsi_boot_sysfs msr snd_hda_codec_hdmi intel_rapl sb_edac
>> raid1 edac_core x86_pkg_temp_thermal intel_powerclamp coretemp raid0
>> md_mod snd_hda_codec_realtek snd_hda_codec_generic kvm_intel kvm
>> snd_hda_intel irqbypass snd_hda_codec crct10dif_pclmul snd_hda_core
>> crc32_pclmul ghash_clmulni_intel snd_hwdep dm_mod aesni_intel aes_x86_64
>> snd_pcm mei_wdt e1000e igb iTCO_wdt lrw dcdbas iTCO_vendor_support
>> snd_timer gf128mul mei_me dell_smm_hwmon glue_helper serio_raw
>> ablk_helper cryptd snd lpc_ich pcspkr ptp i2c_i801 mei mptctl dca
>> mfd_core pps_core soundcore mptbase shpchp fjes tpm_tis tpm btrfs xor
>> raid6_pq hid_generic usbhid crc32c_intel nouveau video mxm_wmi
>> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci
>> fb_sys_fops ehci_pci xhci_hcd ehci_hcd sr_mod ttm cd
>> [ 2330.544036] CPU: 18 PID: 30308 Comm: kworker/u42:4 Tainted: G
>> O 4.7.0-rc7-vanilla #1
>> [ 2330.544036] Hardware name: Dell Inc. Precision Tower 7910/0215PR,
>> BIOS A07 04/14/2015
>> [ 2330.544036] Workqueue: raid5wq raid5_do_work [raid456]
>> [ 2330.544036] 0000000000000000 ffff88103f405bb0 ffffffff813a6eea
>> 0000000000000000
>> [ 2330.544036] 0000000000000000 ffff88103f405bc8 ffffffff8113c3e8
>> ffff8808dc7d8800
>> [ 2330.544036] ffff88103f405c00 ffffffff81180f8c 0000000000000001
>> ffff88103f40a440
>> [ 2330.544036] Call Trace:
>> [ 2330.544036] <NMI> [<ffffffff813a6eea>] dump_stack+0x63/0x89
>> [ 2330.544036] [<ffffffff8113c3e8>] watchdog_overflow_callback+0xc8/0xf0
>> [ 2330.544036] [<ffffffff81180f8c>] __perf_event_overflow+0x7c/0x1b0
>> [ 2330.544036] [<ffffffff8118b644>] perf_event_overflow+0x14/0x20
>> [ 2330.544036] [<ffffffff8100bf57>] intel_pmu_handle_irq+0x1c7/0x460
>> [ 2330.544036] [<ffffffff810053ad>] perf_event_nmi_handler+0x2d/0x50
>> [ 2330.544036] [<ffffffff810312e1>] nmi_handle+0x61/0x140
>> [ 2330.544036] [<ffffffff81031888>] default_do_nmi+0x48/0x130
>> [ 2330.544036] [<ffffffff81031a5b>] do_nmi+0xeb/0x160
>> [ 2330.544036] [<ffffffff816e5c71>] end_repeat_nmi+0x1a/0x1e
>> [ 2330.544036] [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036] [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036] [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036] <<EOE>> [<ffffffff81193bbf>]
>> queued_spin_lock_slowpath+0xb/0xf
>> [ 2330.544036] [<ffffffff816e31ff>] _raw_spin_lock_irq+0x2f/0x40
>> [ 2330.544036] [<ffffffffa084c5d8>]
>> handle_active_stripes.isra.51+0x378/0x4f0 [raid456]
>> [ 2330.544036] [<ffffffffa083f1a6>] ?
>> raid5_wakeup_stripe_thread+0x96/0x1b0 [raid456]
>> [ 2330.544036] [<ffffffffa084cf1d>] raid5_do_work+0x8d/0x120 [raid456]
>> [ 2330.544036] [<ffffffff8109b5bb>] process_one_work+0x14b/0x450
>> [ 2330.544036] [<ffffffff8109b9eb>] worker_thread+0x12b/0x490
>> [ 2330.544036] [<ffffffff8109b8c0>] ? process_one_work+0x450/0x450
>> [ 2330.544036] [<ffffffff810a1599>] kthread+0xc9/0xe0
>> [ 2330.544036] [<ffffffff816e3a9f>] ret_from_fork+0x1f/0x40
>> [ 2330.544036] [<ffffffff810a14d0>] ? kthread_create_on_node+0x180/0x180
>> [ 2330.544036] Kernel panic - not syncing: Hard LOCKUP
>> [ 2330.544036] CPU: 18 PID: 30308 Comm: kworker/u42:4 Tainted: G
>> O 4.7.0-rc7-vanilla #1
>> [ 2330.544036] Hardware name: Dell Inc. Precision Tower 7910/0215PR,
>> BIOS A07 04/14/2015
>> [ 2330.544036] Workqueue: raid5wq raid5_do_work [raid456]
>> [ 2330.544036] 0000000000000000 ffff88103f405b28 ffffffff813a6eea
>> ffffffff81a45241
>> [ 2330.544036] 0000000000000000 ffff88103f405ba0 ffffffff81193642
>> 0000000000000010
>> [ 2330.544036] ffff88103f405bb0 ffff88103f405b50 0000000000000086
>> ffffffff81a2a2e2
>> [ 2330.544036] Call Trace:
>> [ 2330.544036] <NMI> [<ffffffff813a6eea>] dump_stack+0x63/0x89
>> [ 2330.544036] [<ffffffff81193642>] panic+0xd2/0x223
>> [ 2330.544036] [<ffffffff810823af>] nmi_panic+0x3f/0x40
>> [ 2330.544036] [<ffffffff8113c401>] watchdog_overflow_callback+0xe1/0xf0
>> [ 2330.544036] [<ffffffff81180f8c>] __perf_event_overflow+0x7c/0x1b0
>> [ 2330.544036] [<ffffffff8118b644>] perf_event_overflow+0x14/0x20
>> [ 2330.544036] [<ffffffff8100bf57>] intel_pmu_handle_irq+0x1c7/0x460
>> [ 2330.544036] [<ffffffff810053ad>] perf_event_nmi_handler+0x2d/0x50
>> [ 2330.544036] [<ffffffff810312e1>] nmi_handle+0x61/0x140
>> [ 2330.544036] [<ffffffff81031888>] default_do_nmi+0x48/0x130
>> [ 2330.544036] [<ffffffff81031a5b>] do_nmi+0xeb/0x160
>> [ 2330.544036] [<ffffffff816e5c71>] end_repeat_nmi+0x1a/0x1e
>> [ 2330.544036] [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036] [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036] [<ffffffff810cbcc7>] ?
>> native_queued_spin_lock_slowpath+0x117/0x1a0
>> [ 2330.544036] <<EOE>> [<ffffffff81193bbf>]
>> queued_spin_lock_slowpath+0xb/0xf
>> [ 2330.544036] [<ffffffff816e31ff>] _raw_spin_lock_irq+0x2f/0x40
>> [ 2330.544036] [<ffffffffa084c5d8>]
>> handle_active_stripes.isra.51+0x378/0x4f0 [raid456]
>> [ 2330.544036] [<ffffffffa083f1a6>] ?
>> raid5_wakeup_stripe_thread+0x96/0x1b0 [raid456]
>> [ 2330.544036] [<ffffffffa084cf1d>] raid5_do_work+0x8d/0x120 [raid456]
>> [ 2330.544036] [<ffffffff8109b5bb>] process_one_work+0x14b/0x450
>> [ 2330.544036] [<ffffffff8109b9eb>] worker_thread+0x12b/0x490
>> [ 2330.544036] [<ffffffff8109b8c0>] ? process_one_work+0x450/0x450
>> [ 2330.544036] [<ffffffff810a1599>] kthread+0xc9/0xe0
>> [ 2330.544036] [<ffffffff816e3a9f>] ret_from_fork+0x1f/0x40
>> [ 2330.544036] [<ffffffff810a14d0>] ? kthread_create_on_node+0x180/0x180
>>
>> The crash dump file is quite big (124MB), I need to find a method to
>> share, if anyone of you wants it, please let me know.
>>
>> IMHO, this hard lockup seems related to bitmap allocation, because it
>> can be easily reproduced on a new-created md raid5 target, with 40+
>> processes doing big size (8MB+) writing.
>
> Hi,
>
> Sounds like a deadlock. Can you enable lockdep and run the test again and see
> if lockdep gives any hint?
Hi Shaohua,
I reproduce the hard lockup on 4.8-rc5,this time I add lockdep but
information is very limited, here is the panic information,
[ 616.690899] NMI watchdog: Watchdog detected hard LOCKUP on cpu
16.dModules linked in: af_packet iscsi_ibft iscsi_boot_sysfs msr
ipmi_ssif intel_rapl cdc_ether usbnet mii edac_core x86_pkg_temp_thermal
intel_powerclamp coretemp raid456 async_raid6_recov async_memcpy
libcrc32c async_pq async_xor async_tx kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64
ipmi_devintf lrw md_mod glue_helper ablk_helper cryptd pcspkr tg3
iTCO_wdt ptp iTCO_vendor_support pps_core i2c_i801 libphy i2c_smbus
mei_me lpc_ich mxm_wmi mfd_core mei shpchp ipmi_si ipmi_msghandler fjes
wmi tpm_tis tpm_tis_core acpi_pad button tpm hid_generic usbhid btrfs
xor zlib_deflate raid6_pq crc32c_intel megaraid_sas xhci_pci xhci_hcd
mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm ehci_pci ehci_hcd nvme usbcore usb_common nvme_core
sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[ 616.690899] irq event stamp: 27052
[ 616.690900] hardirqs last enabled at (27051): [<ffffffff81104a15>]
vprintk_emit+0x1d5/0x550
[ 616.690900] hardirqs last disabled at (27052): [<ffffffff817cbb5f>]
_raw_spin_lock_irq+0x1f/0x90
[ 616.690901] softirqs last enabled at (26668): [<ffffffff817cf857>]
__do_softirq+0x1f7/0x4b7
[ 616.690901] softirqs last disabled at (26659): [<ffffffff8109803b>]
irq_exit+0xab/0xc0
[ 616.690901] CPU: 16 PID: 11681 Comm: fio Not tainted 4.8.0-rc4-vanilla #2
[ 616.690902] Hardware name: LENOVO System x3650 M5
-[546225Z]-/XXXXXXX, BIOS -[TCE123M-2.10]- 06/23/2016
[ 616.690902] 0000000000000000 ffff880256c05ba8 ffffffff81438dec
0000000000000000
[ 616.690903] 0000000000000010 ffff880256c05bc8 ffffffff8117ea7f
ffff88017dc4e800
[ 616.690903] 0000000000000000 ffff880256c05c00 ffffffff811c6b5b
0000000000000001
[ 616.690903] Call Trace:
[ 616.690904] <NMI> [<ffffffff81438dec>] dump_stack+0x85/0xc9
[ 616.690904] [<ffffffff8117ea7f>] watchdog_overflow_callback+0x13f/0x160
[ 616.690904] [<ffffffff811c6b5b>] __perf_event_overflow+0x8b/0x1d0
[ 616.690905] [<ffffffff811d3f94>] perf_event_overflow+0x14/0x20
[ 616.690905] [<ffffffff8100c631>] intel_pmu_handle_irq+0x1d1/0x4a0
[ 616.690906] [<ffffffff8100582d>] perf_event_nmi_handler+0x2d/0x50
[ 616.690906] [<ffffffff810391ae>] nmi_handle+0x9e/0x2d0
[ 616.690906] [<ffffffff81039115>] ? nmi_handle+0x5/0x2d0
[ 616.690906] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690907] [<ffffffff81039631>] default_do_nmi+0x71/0x1b0
[ 616.690907] [<ffffffff8103988c>] do_nmi+0x11c/0x190
[ 616.690907] [<ffffffff817cdf91>] end_repeat_nmi+0x1a/0x1e
[ 616.690908] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690908] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690909] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690909] <<EOE>> [<ffffffff81458d87>]
debug_smp_processor_id+0x17/0x20
[ 616.690909] [<ffffffff814465f2>] delay_tsc+0x22/0xc0
[ 616.690910] [<ffffffff8144650f>] __delay+0xf/0x20
[ 616.690910] [<ffffffff810f5626>] do_raw_spin_lock+0x86/0x130
[ 616.690910] [<ffffffff817cbbac>] _raw_spin_lock_irq+0x6c/0x90
[ 616.690911] [<ffffffffa083623b>] ?
raid5_get_active_stripe+0x6b/0x8c0 [raid456]
[ 616.690911] [<ffffffffa083623b>] raid5_get_active_stripe+0x6b/0x8c0
[raid456]
[ 616.690911] [<ffffffff817cbcaa>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[ 616.690912] [<ffffffff810e1352>] ? prepare_to_wait+0x62/0x90
[ 616.690912] [<ffffffffa0836c6a>] raid5_make_request+0x1da/0xb50
[raid456]
[ 616.690912] [<ffffffffa05082aa>] ? md_make_request+0x1fa/0x4f0 [md_mod]
[ 616.690913] [<ffffffff810e1810>] ? prepare_to_wait_event+0x100/0x100
[ 616.690913] [<ffffffffa05082aa>] md_make_request+0x1fa/0x4f0 [md_mod]
[ 616.690914] [<ffffffffa0508107>] ? md_make_request+0x57/0x4f0 [md_mod]
[ 616.690914] [<ffffffff814043f9>] generic_make_request+0x159/0x2c0
[ 616.690914] [<ffffffff810ea809>] ? get_lock_stats+0x19/0x60
[ 616.690915] [<ffffffff814045cd>] submit_bio+0x6d/0x150
[ 616.690915] [<ffffffff810ed69d>] ? trace_hardirqs_on+0xd/0x10
[ 616.690915] [<ffffffff812c28aa>] do_blockdev_direct_IO+0x163a/0x2630
[ 616.690916] [<ffffffff812bd780>] ? I_BDEV+0x20/0x20
[ 616.690916] [<ffffffff812c38da>] __blockdev_direct_IO+0x3a/0x40
[ 616.690916] [<ffffffff812bde7c>] blkdev_direct_IO+0x4c/0x70
[ 616.690917] [<ffffffff811e39a7>] generic_file_direct_write+0xa7/0x160
[ 616.690917] [<ffffffff811e3b1d>] __generic_file_write_iter+0xbd/0x1e0
[ 616.690917] [<ffffffff812be7b0>] ? bd_acquire+0xb0/0xb0
[ 616.690918] [<ffffffff812be822>] blkdev_write_iter+0x72/0xd0
[ 616.690918] [<ffffffff813dd648>] ? apparmor_file_permission+0x18/0x20
[ 616.690918] [<ffffffff813a38ed>] ? security_file_permission+0x3d/0xc0
[ 616.690919] [<ffffffff812d4549>] aio_run_iocb+0x239/0x2c0
[ 616.690919] [<ffffffff812d5703>] do_io_submit+0x233/0x860
[ 616.690919] [<ffffffff812d58aa>] ? do_io_submit+0x3da/0x860
[ 616.690920] [<ffffffff812d5d40>] SyS_io_submit+0x10/0x20
[ 616.690920] [<ffffffff817cc6c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 616.690920] [<ffffffff81458da3>] ? __this_cpu_preempt_check+0x13/0x20
[ 616.690921] Kernel panic - not syncing: Hard LOCKUP
[ 616.690921] CPU: 16 PID: 11681 Comm: fio Not tainted 4.8.0-rc4-vanilla #2
[ 616.690921] Hardware name: LENOVO System x3650 M5
-[546225Z]-/XXXXXXX, BIOS -[TCE123M-2.10]- 06/23/2016
[ 616.690922] 0000000000000000 ffff880256c05b20 ffffffff81438dec
0000000000000000
[ 616.690922] ffffffff81a69c3f ffff880256c05b98 ffffffff811dd399
0000000000000010
[ 616.690922] ffff880256c05ba8 ffff880256c05b48 0000000000000086
ffffffff81a4bcbe
[ 616.690923] Call Trace:
[ 616.690923] <NMI> [<ffffffff81438dec>] dump_stack+0x85/0xc9
[ 616.690923] [<ffffffff811dd399>] panic+0xe0/0x22c
[ 616.690924] [<ffffffff810905af>] nmi_panic+0x3f/0x40
[ 616.690924] [<ffffffff8117ea91>] watchdog_overflow_callback+0x151/0x160
[ 616.690924] [<ffffffff811c6b5b>] __perf_event_overflow+0x8b/0x1d0
[ 616.690925] [<ffffffff811d3f94>] perf_event_overflow+0x14/0x20
[ 616.690925] [<ffffffff8100c631>] intel_pmu_handle_irq+0x1d1/0x4a0
[ 616.690925] [<ffffffff8100582d>] perf_event_nmi_handler+0x2d/0x50
[ 616.690926] [<ffffffff810391ae>] nmi_handle+0x9e/0x2d0
[ 616.690926] [<ffffffff81039115>] ? nmi_handle+0x5/0x2d0
[ 616.690926] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690927] [<ffffffff81039631>] default_do_nmi+0x71/0x1b0
[ 616.690927] [<ffffffff8103988c>] do_nmi+0x11c/0x190
[ 616.690927] [<ffffffff817cdf91>] end_repeat_nmi+0x1a/0x1e
[ 616.690928] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690928] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690929] [<ffffffff81458ca3>] ? check_preemption_disabled+0x23/0xf0
[ 616.690929] <<EOE>> [<ffffffff81458d87>]
debug_smp_processor_id+0x17/0x20
[ 616.690929] [<ffffffff814465f2>] delay_tsc+0x22/0xc0
[ 616.690929] [<ffffffff8144650f>] __delay+0xf/0x20
[ 616.690930] [<ffffffff810f5626>] do_raw_spin_lock+0x86/0x130
[ 616.690930] [<ffffffff817cbbac>] _raw_spin_lock_irq+0x6c/0x90
[ 616.690931] [<ffffffffa083623b>] ?
raid5_get_active_stripe+0x6b/0x8c0 [raid456]
[ 616.690931] [<ffffffffa083623b>] raid5_get_active_stripe+0x6b/0x8c0
[raid456]
[ 616.690931] [<ffffffff817cbcaa>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[ 616.690932] [<ffffffff810e1352>] ? prepare_to_wait+0x62/0x90
[ 616.690932] [<ffffffffa0836c6a>] raid5_make_request+0x1da/0xb50
[raid456]
[ 616.690932] [<ffffffffa05082aa>] ? md_make_request+0x1fa/0x4f0 [md_mod]
[ 616.690933] [<ffffffff810e1810>] ? prepare_to_wait_event+0x100/0x100
[ 616.690933] [<ffffffffa05082aa>] md_make_request+0x1fa/0x4f0 [md_mod]
[ 616.690934] [<ffffffffa0508107>] ? md_make_request+0x57/0x4f0 [md_mod]
[ 616.690934] [<ffffffff814043f9>] generic_make_request+0x159/0x2c0
[ 616.690934] [<ffffffff810ea809>] ? get_lock_stats+0x19/0x60
[ 616.690935] [<ffffffff814045cd>] submit_bio+0x6d/0x150
[ 616.690935] [<ffffffff810ed69d>] ? trace_hardirqs_on+0xd/0x10
[ 616.690935] [<ffffffff812c28aa>] do_blockdev_direct_IO+0x163a/0x2630
[ 616.690936] [<ffffffff812bd780>] ? I_BDEV+0x20/0x20
[ 616.690936] [<ffffffff812c38da>] __blockdev_direct_IO+0x3a/0x40
[ 616.690936] [<ffffffff812bde7c>] blkdev_direct_IO+0x4c/0x70
[ 616.690937] [<ffffffff811e39a7>] generic_file_direct_write+0xa7/0x160
[ 616.690937] [<ffffffff811e3b1d>] __generic_file_write_iter+0xbd/0x1e0
[ 616.690937] [<ffffffff812be7b0>] ? bd_acquire+0xb0/0xb0
[ 616.690938] [<ffffffff812be822>] blkdev_write_iter+0x72/0xd0
[ 616.690938] [<ffffffff813dd648>] ? apparmor_file_permission+0x18/0x20
[ 616.690938] [<ffffffff813a38ed>] ? security_file_permission+0x3d/0xc0
[ 616.690939] [<ffffffff812d4549>] aio_run_iocb+0x239/0x2c0
[ 616.690939] [<ffffffff812d5703>] do_io_submit+0x233/0x860
[ 616.690939] [<ffffffff812d58aa>] ? do_io_submit+0x3da/0x860
[ 616.690940] [<ffffffff812d5d40>] SyS_io_submit+0x10/0x20
[ 616.690940] [<ffffffff817cc6c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 616.690940] [<ffffffff81458da3>] ? __this_cpu_preempt_check+0x13/0x20
Only these 4 lines about irq stamps can be found,
[ 616.690899] irq event stamp: 27052
[ 616.690900] hardirqs last enabled at (27051): [<ffffffff81104a15>]
vprintk_emit+0x1d5/0x550
[ 616.690900] hardirqs last disabled at (27052): [<ffffffff817cbb5f>]
_raw_spin_lock_irq+0x1f/0x90
[ 616.690901] softirqs last enabled at (26668): [<ffffffff817cf857>]
__do_softirq+0x1f7/0x4b7
[ 616.690901] softirqs last disabled at (26659): [<ffffffff8109803b>]
irq_exit+0xab/0xc0
I suspect this panic is introduced by r5conf->hash_locks[]. I have a
kdump image, I try to do some analyze, here is what I get,
from conf->hash_locks[], I see hash_locks[0] and hash_locks[7] have
rlock.raw_lock.val.count being 1, here is the crash output of each spin
lock,
hash_locks[0]:
{
rlock = {
raw_lock = {
val = {
counter = 1
}
},
magic = 3735899821,
owner_cpu = 13,
owner = 0xffff880244ff80c0,
dep_map = {
key = 0xffffffffa0845bd0 <__key.47065>,
class_cache = {0xffffffff828e6e70 <lock_classes+640336>, 0x0},
name = 0xffffffffa083ec5f "&(conf->hash_locks)->rlock",
cpu = 13,
ip = 18446744072107549243
}
},
{
__padding =
"\001\000\000\000\255N\255\336\r\000\000\000\000\000\000\000\300\200\377D\002\210\377\377",
dep_map = {
key = 0xffffffffa0845bd0 <__key.47065>,
class_cache = {0xffffffff828e6e70 <lock_classes+640336>, 0x0},
name = 0xffffffffa083ec5f "&(conf->hash_locks)->rlock",
cpu = 13,
ip = 18446744072107549243
}
}
}
}
hash_locks[7]:
{
{
rlock = {
raw_lock = {
val = {
counter = 1
}
},
magic = 3735899821,
owner_cpu = 19,
owner = 0xffff88024716c340,
dep_map = {
key = 0xffffffffa0845bc8 <__key.47066>,
class_cache = {0xffffffff828e7440 <lock_classes+641824>, 0x0},
name = 0xffffffffa083f800 "&(conf->hash_locks + i)->rlock",
cpu = 19,
ip = 18446744072107549243
}
},
{
__padding =
"\001\000\000\000\255N\255\336\023\000\000\000\000\000\000\000@\303\026G\002\210\377\377",
dep_map = {
key = 0xffffffffa0845bc8 <__key.47066>,
class_cache = {0xffffffff828e7440 <lock_classes+641824>, 0x0},
name = 0xffffffffa083f800 "&(conf->hash_locks + i)->rlock",
cpu = 19,
ip = 18446744072107549243
}
}
}
}
From the above information, I see both locks acquire_ip is
18446744072107549243, then I check all 'fio' processes who acquire a
lock at this address. Then I found 12 threads acquiring a spin lock at
this address, but there are 2 lock instances.
For conf->hash_locks[0], the lockdep_map instance address is
0xffff880246f60820, content is,
struct lockdep_map {
key = 0xffffffffa0845bd0 <__key.47065>,
class_cache = {0xffffffff828e6e70 <lock_classes+640336>, 0x0},
name = 0xffffffffa083ec5f "&(conf->hash_locks)->rlock",
cpu = 13,
ip = 18446744072107549243
}
there are 7 'fio' threads has this lock in their task_struct->held_locks
list,
PID: 11629 TASK: ffff880251ba4f80 CPU: 3 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
PID: 11643 TASK: ffff88022d331300 CPU: 10 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
PID: 11648 TASK: ffff88024c3a1440 CPU: 14 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
PID: 11652 TASK: ffff88024e1b5540 CPU: 5 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
PID: 11653 TASK: ffff880229a70000 CPU: 12 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
PID: 11656 TASK: ffff880244ff80c0 CPU: 13 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
PID: 11657 TASK: ffff880243220100 CPU: 7 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60820,
For conf->hash_lock[7], the lockdep_map instance address is
0xffff880246f60a18, content is,
struct lockdep_map {
key = 0xffffffffa0845bc8 <__key.47066>,
class_cache = {0xffffffff828e7440 <lock_classes+641824>, 0x0},
name = 0xffffffffa083f800 "&(conf->hash_locks + i)->rlock",
cpu = 19,
ip = 18446744072107549243
}
there are 5 'fio' threads has this lock in their task_struct->held_locks
list,
PID: 11663 TASK: ffff880242bb0280 CPU: 18 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60a18,
PID: 11666 TASK: ffff88024716c340 CPU: 19 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60a18,
PID: 11671 TASK: ffff880252504480 CPU: 1 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60a18,
PID: 11678 TASK: ffff88023735c640 CPU: 2 COMMAND: "fio"
acquire_ip = 18446744072107549776,
instance = 0xffff880246f60a18,
PID: 11681 TASK: ffff880241a40700 CPU: 16 COMMAND: "fio"
acquire_ip = 18446744072107549243,
instance = 0xffff880246f60a18,
Unfortunately, I don't see the panic thread ID 11681 from the above 12
threads.
Currently I guess this hard lockup is triggered by a waiting task which
holds a spin lock. So I check all waiting list that r5conf may have,
wait_queue_head_t wait_for_quiescent;
wait_queue_head_t wait_for_stripe;
wait_queue_head_t wait_for_overlap;
From crash output,
- conf->wait_for_quiescent is empty.
- conf->wait_for_stripe has 6 threads on it,
PID: 11641 TASK: ffff880219a89280 CPU: 0 COMMAND: "fio"
PID: 11685 TASK: ffff880228f6c800 CPU: 2 COMMAND: "fio"
PID: 11628 TASK: ffff88017e0d8f40 CPU: 1 COMMAND: "fio"
PID: 11672 TASK: ffff8802525fc4c0 CPU: 10 COMMAND: "fio"
PID: 11638 TASK: ffff8801d27911c0 CPU: 1 COMMAND: "fio"
PID: 11632 TASK: ffff88024cfc1040 CPU: 10 COMMAND: "fio"
- conf->wait_for_overlap has 5 threads on it,
PID: 11657 TASK: ffff880243220100 CPU: 7 COMMAND: "fio"
PID: 11629 TASK: ffff880251ba4f80 CPU: 3 COMMAND: "fio"
PID: 11644 TASK: ffff88017e219340 CPU: 3 COMMAND: "fio"
PID: 11636 TASK: ffff8802199e9140 CPU: 3 COMMAND: "fio"
PID: 11630 TASK: ffff880250724fc0 CPU: 17 COMMAND: "fio"
What I find that might be interesting is, I see 2 threads (pid 11657,
pid 11629) are on conf->wait_for_overlap list, but they are also threads
which has conf->hash_locks[0] in their task->held_locks list.
I am not sure where this is the reason that IRQ is disabled too much time.
This hard lockup issue is very easy to be reproduced on fast storage
devices (e.g. NVMe SSDs), what I have are Memblaze Pblaze3 PCIe SSD. For
any debug information or testing, I am gald to do that. Currently I am
looking at this issue for a while, but progress is little.
Thanks in advance for taking a look on it.
Coly
^ permalink raw reply
* [PATCH] dm: Return correct value in retry loop
From: Minfei Huang @ 2016-09-06 8:00 UTC (permalink / raw)
To: agk, snitzer, shli; +Cc: dm-devel, linux-raid, linux-kernel, Minfei Huang
dm_resume will return sliently in retry loop's failure. Assign a correct
return value in the failed loop.
Remove a useless assignment as well.
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
---
drivers/md/dm.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fa9b1cb..c935cc8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2249,10 +2249,11 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
int dm_resume(struct mapped_device *md)
{
- int r = -EINVAL;
+ int r;
struct dm_table *map = NULL;
retry:
+ r = -EINVAL;
mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
if (!dm_suspended_md(md))
@@ -2277,10 +2278,8 @@ retry:
clear_bit(DMF_SUSPENDED, &md->flags);
- r = 0;
out:
mutex_unlock(&md->suspend_lock);
-
return r;
}
--
2.7.4 (Apple Git-66)
^ permalink raw reply related
* RE: Checkarray doesn't seem to do anything
From: Mikael Abrahamsson @ 2016-09-05 8:15 UTC (permalink / raw)
To: linux-raid
In-Reply-To: <20160302125329.0a91db92e0d9ae1e5e3fde0508678719.6af0e5bebc.wbe@email03.secureserver.net>
I just wanted to say I ran into this on Ubuntu 16.04 just now, and it's
still not fixed.
Seems there are multiple bug id:s
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950#10
https://bugs.launchpad.net/debian/+source/mdadm/+bug/1550823
But this is still not fixed in Ubuntu 16.04, potentially in Debian as
well.
So it might be good for everybody to know that their arrays most likely
aren't being periodically checked if you upgrade to Ubuntu 16.04
currently.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply
* Re: [PATCH V3] md-cluster: make md-cluster also can work when compiled into kernel
From: NeilBrown @ 2016-09-05 3:10 UTC (permalink / raw)
To: linux-raid; +Cc: shli, Guoqing Jiang, v4.1+
In-Reply-To: <1473041848-28009-1-git-send-email-gqjiang@suse.com>
[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]
On Mon, Sep 05 2016, Guoqing Jiang wrote:
> The md-cluster is compiled as module by default,
> if it is compiled by built-in way, then we can't
> make md-cluster works.
>
> [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors
> [64782.630528] md-cluster module not found.
> [64782.630530] md127: Could not setup cluster service (-2)
>
> Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions")
> Cc: stable@vger.kernel.org (v4.1+)
> Cc: NeilBrown <neilb@suse.com>
> Reported-by: Marc Smith <marc.smith@mcc.edu>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
> V3 changes:
> 1. add the "!md_cluster_ops" test back
> 2. fix wrong mail info of stable kernel
>
> V2 changes:
> 1. call try_module_get if md_cluster_ops is already set,
> otherwise try_module_get/module_put are unbalanced.
>
> drivers/md/md.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 67642ba..915e84d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7610,16 +7610,12 @@ EXPORT_SYMBOL(unregister_md_cluster_operations);
>
> int md_setup_cluster(struct mddev *mddev, int nodes)
> {
> - int err;
> -
> - err = request_module("md-cluster");
> - if (err) {
> - pr_err("md-cluster module not found.\n");
> - return -ENOENT;
> - }
> -
> + if (!md_cluster_ops)
> + request_module("md-cluster");
> spin_lock(&pers_lock);
> + /* ensure module won't be unloaded */
> if (!md_cluster_ops || !try_module_get(md_cluster_mod)) {
> + pr_err("can't find md-cluster module or get it's reference.\n");
> spin_unlock(&pers_lock);
> return -ENOENT;
> }
> --
> 2.6.6
Reviewed-by: NeilBrown <neilb@suse.com>
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox