Linux RAID subsystem development
 help / color / mirror / Atom feed
* Re: Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").
From: Guoqing Jiang @ 2016-09-14  8:25 UTC (permalink / raw)
  To: Shaohua Li, Christophe JAILLET; +Cc: linux-raid, linux-kernel
In-Reply-To: <20160913172433.GB24264@kernel.org>



On 09/13/2016 01:24 PM, Shaohua Li wrote:
> On Mon, Sep 12, 2016 at 09:09:48PM +0200, Christophe JAILLET wrote:
>> Hi,
>>
>> I'm puzzled by commit f9a67b1182e5 ("md/bitmap: clear bitmap if
>> bitmap_create failed").
> Hi Christophe,
> Thank you very much to help check this!
>
>> Part of the commit is:
>>
>> @@ -1865,8 +1866,10 @@ int bitmap_copy_from_slot(struct mddev *mddev, int
>> slot,
>>       struct bitmap_counts *counts;
>>       struct bitmap *bitmap = bitmap_create(mddev, slot);
>>
>> -    if (IS_ERR(bitmap))
>> +    if (IS_ERR(bitmap)) {
>> +        bitmap_free(bitmap);
>>           return PTR_ERR(bitmap);
>> +    }
>>
>> but if 'bitmap' is an error, I think that bad things will happen in
>> 'bitmap_free()' when, at the beginning of the function, we will execute:
>>
>>      if (bitmap->sysfs_can_clear) <-----------------
>>          sysfs_put(bitmap->sysfs_can_clear);

I guess it is safe, since below part is at the beginning of bitmap_free.

         if (!bitmap) /* there was no bitmap */
                 return;

> Add Guoqing.
>
> Yeah, you are right, this bitmap_free isn't required. This must be something
> slip in in the v2 patch. I'll delete that line.
>
>> However, the commit log message is really explicit and adding this call to
>> 'bitmap_free' has really been done one purpose. ("If bitmap_create returns
>> an error, we need to call either bitmap_destroy or bitmap_free to do clean
>> up, ...")
> this log is a little confusing, I thought it really means the bitmap_free called
> in bitmap_create. The V1 patch calls bitmap_destroy in bitmap_create.

I double checked v1 patch, it called bitmap_destroy once bitmap_create 
returned
error inside bitmap_copy_from_slot, also bitmap_destroy is also not 
called in
location_store once failed to create bitmap.

But since bitmap_free within bitmap_create is used to handle related 
failure,
seems we don't need the patch, and maybe we also don't need the second line
of below comments (the patch is motivated by the comment IIRC).

/*
  * initialize the bitmap structure
  * if this returns an error, bitmap_destroy must be called to do clean up
  */

Thanks,
Guoqing

^ permalink raw reply

* moving spares into group and checking spares
From: scar @ 2016-09-14  5:18 UTC (permalink / raw)
  To: linux-raid-u79uwXL29TY76Z2rM5mHXA

i currently have four RAID-5 md arrays which i concatenated into one 
logical volume (lvm2), essentially creating a RAID-50.  each md array 
was created with one spare disk.

instead, i would like to move the four spare disks into one group that 
each of the four arrays can have access to when needed.  i was wondering 
how to safely accomplish this, preferably without unmounting/disrupting 
the filesystem.

secondly, i have the checkarray script scheduled via cron to 
periodically check each of the four arrays.  i noticed in the output of 
checkarray that it doesn't list the spare disk(s).  so i'm guessing they 
are not being checked?  i was wondering, then, how i could also check 
the spare disks to make sure they are healthy and ready to be used if 
needed?

below is output of /proc/mdstat and mdadm.conf

thanks
--

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active (auto-read-only) raid5 sdal1[0] sdaw1[11](S) sdav1[10] 
sdau1[9] sdat1[8] sdas1[7] sdar1[6] sdaq1[5] sdap1[4] sdao1[3] sdan1[2] 
sdam1[1]
       9766297600 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[11/11] [UUUUUUUUUUU]
       bitmap: 0/8 pages [0KB], 65536KB chunk

md2 : active raid5 sdaa1[0] sdak1[11](S) sdz1[10] sdaj1[9] sdai1[8] 
sdah1[7] sdag1[6] sdaf1[5] sdae1[4] sdad1[3] sdac1[2] sdab1[1]
       9766297600 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[11/11] [UUUUUUUUUUU]
       bitmap: 1/8 pages [4KB], 65536KB chunk

md1 : active raid5 sdn1[0] sdy1[11](S) sdx1[10] sdw1[9] sdv1[8] sdu1[7] 
sdt1[6] sds1[5] sdr1[4] sdq1[3] sdp1[2] sdo1[1]
       9766297600 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[11/11] [UUUUUUUUUUU]
       bitmap: 1/8 pages [4KB], 65536KB chunk

md0 : active raid5 sdb1[0] sdm1[11](S) sdl1[10] sdk1[9] sdj1[8] sdi1[7] 
sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1]
       9766297600 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[11/11] [UUUUUUUUUUU]
       bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>

# cat /etc/mdadm/mdadm.conf
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
MAILADDR root
ARRAY /dev/md/0  metadata=1.2 UUID=6dd6eba5:50fd8c6d:33ad61ee:e84763a8 
name=hind:0
    spares=1
ARRAY /dev/md/1  metadata=1.2 UUID=9336c73a:8b8993bf:ea6cfc3d:bf9f7441 
name=hind:1
    spares=1
ARRAY /dev/md/2  metadata=1.2 UUID=817bf91c:4f14fcb0:9ba8b112:768321ee 
name=hind:2
    spares=1
ARRAY /dev/md/3  metadata=1.2 UUID=1251c6b7:36aca0eb:b66b4c8c:830793ad 
name=hind:3
    spares=1

#

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-14  4:33 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID
In-Reply-To: <CAHscji1vy55L_BgH=y4X=FMeEgHyxzckhAgLu=BojtTn_2mPug@mail.gmail.com>

On Tue, Sep 13, 2016 at 2:04 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote:

> [root@lamachine ~]# gdisk -l /dev/sdc
> GPT fdisk (gdisk) version 1.0.1
>
> Warning! Disk size is smaller than the main header indicates!

This is true...

> Disk /dev/sdc: 5860531055 sectors, 2.7 TiB
> First usable sector is 2048, last usable sector is 5860533134

The last usable sector LBA is bigger than the total number of LBAs. So
either there's a bug in whatever partitioned this or maybe the
partition map was copied from one disk to another somehow? Hard to
say, but it happened twice as sde has the exact same problem.

> [root@lamachine ~]# gdisk -l /dev/sde
> GPT fdisk (gdisk) version 1.0.1
>
> Warning! Disk size is smaller than the main header indicates!

> Disk /dev/sde: 5860531055 sectors, 2.7 TiB
> First usable sector is 2048, last usable sector is 5860533134

Pretty weird.  Any ideas how that happened? My guess is sdd was
partitioned first, and its partition was copied to sdc and sde, and
the tool blindly did not recompute the last usable sector LBA, it used
the value from sdd.

Anyway...

sdd1 is 2TB
sdd2 is 500MB

And it looks like sdc and sde, if we believe the backup GPT, have the
same exact partition scheme.

sdd1 has mdadm v1.2 metadata indicating it's a raid5 with two other
members, logically that means sdc1 and sde1 are the missing members
for md128.
sdd2 has mdadm v1.2 metadata indicating it's a raid0 with two other
members, logically that means sdc2 and sde2 are the missing members
for md129.

This is consistent with the metadata that's been found on sdd1 and
sdd2. So now the question is really how to go about fixing sdc and sde
partition tables, so that their partitions appear?

Weirdly enough the safest way to fix it is to replace the PMBR with a
conventional MBR with two primary partitions with start and end LBAs
just as gdisk shows them for sdc, sdd and sde. Why? Well, by spec,
even if you don't remove the GPT signatures, if an MBR is present and
is not a protective MBR, then it is supposed to be honored over the
GPT. That 'd let you keep the GPT untouched, and only alter the MBR
which right now doesn't contain any valuable information anyway. The
trick though is you need to use an old version of fdisk that won't
check for the GPT first; OR you can use wipefs to wipe the GPT
signatures only, and then use fdisk to create a new MBR (msdos disk
label it's sometimes called).

Hint with wipefs. First, use it with -n to see what it will do. And
then when you're ready to act, replace -n with -b which will create a
backup file for what it's wiping. The signature is a small amount of
data that's easily replaced and not unique to the table being wiped so
it's still possible to use the GPT later should it be necessary. i.e.
everything I'm describing is reversible.

wipefs -a -n /dev/sdc
wipefs -a -n /dev/sde

So what do you get for that?

When you get ready to use fdisk to create the new partitions, since
both use metadata 1.2, use 0xda as the partition type code (although
0x83 is OK also, 0xfd technically only applies to 0.9 metadata for
kernel autodetect).

Chances are you could just use gdisk to verify and fix the primary and
backup GPTs on sdc and sde, it looks like there's nothing in the
vicinity that'll get stepped on by doing this. But the above is one
part being strict about being able to reverse each step, and 1 part
ass covering.

When it's all done and working with the new MBR you can either leave
it alone, or you can run gdisk on it and it will immediately convert
it (in memory) and you can commit it to disk with the w command to go
back to GPT - which I personally prefer just because there are two
copies of everything, and each copy is separately checksummed.



-- 
Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13 21:46 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAJCQCtRnundJMzxe-7_KrCav-rBHxUw7U0dPifurE4dyXtY59w@mail.gmail.com>

> Sorry, mdadm -E on the individual drives and their partitions. I'm
> curious what metadata it finds on each if any.

[root@lamachine ~]# mdadm -E /dev/sd*
/dev/sda:
   MBR Magic : aa55
Partition[0] :     61440000 sectors at           63 (type fd)
Partition[1] :    512000000 sectors at     61440063 (type fd)
Partition[2] :    403328002 sectors at    573440063 (type 05)
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
  Creation Time : Thu Dec  3 22:12:12 2009
     Raid Level : raid10
  Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
     Array Size : 30719936 (29.30 GiB 31.46 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126

    Update Time : Tue Sep 13 22:43:06 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ed981d35 - correct
         Events : 264152

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8        1        1      active sync   /dev/sda1

   0     0       8       82        0      active sync   /dev/sdf2
   1     1       8        1        1      active sync   /dev/sda1
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
  Creation Time : Mon Feb 11 07:54:36 2013
     Raid Level : raid5
  Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
     Array Size : 511999872 (488.28 GiB 524.29 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 2

    Update Time : Tue Sep 13 20:29:02 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 73b16d69 - correct
         Events : 611

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8        2        2      active sync   /dev/sda2

   0     0       8       83        0      active sync   /dev/sdf3
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8        2        2      active sync   /dev/sda2
/dev/sda3:
   MBR Magic : aa55
Partition[0] :     62910589 sectors at           63 (type 83)
Partition[1] :      7116795 sectors at     82445692 (type 05)
/dev/sda5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5
           Name : reading.homeunix.com:3
  Creation Time : Tue Jul 26 19:00:28 2011
     Raid Level : raid0
   Raid Devices : 3

 Avail Dev Size : 62908541 (30.00 GiB 32.21 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=0 sectors
          State : clean
    Device UUID : a0efc1b3:94cc6eb8:deea76ca:772b2d2d

    Update Time : Tue Jul 26 19:00:28 2011
       Checksum : 9eba9119 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
mdadm: No md superblock detected on /dev/sda6.
/dev/sdb:
   MBR Magic : aa55
Partition[1] :    512000000 sectors at       409663 (type fd)
Partition[2] :     16384000 sectors at    512409663 (type 82)
Partition[3] :    447974402 sectors at    528793663 (type 05)
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
  Creation Time : Mon Feb 11 07:54:36 2013
     Raid Level : raid5
  Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
     Array Size : 511999872 (488.28 GiB 524.29 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 2

    Update Time : Tue Sep 13 20:29:02 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 73b16d77 - correct
         Events : 611

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       8       83        0      active sync   /dev/sdf3
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8        2        2      active sync   /dev/sda2
mdadm: No md superblock detected on /dev/sdb3.
/dev/sdb4:
   MBR Magic : aa55
Partition[0] :     62912354 sectors at           63 (type 83)
Partition[1] :      7116795 sectors at     82447457 (type 05)
/dev/sdb5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5
           Name : reading.homeunix.com:3
  Creation Time : Tue Jul 26 19:00:28 2011
     Raid Level : raid0
   Raid Devices : 3

 Avail Dev Size : 62910306 (30.00 GiB 32.21 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=0 sectors
          State : clean
    Device UUID : 152d0202:64efb3e7:f23658c3:82a239a1

    Update Time : Tue Jul 26 19:00:28 2011
       Checksum : 892dbb61 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
mdadm: No md superblock detected on /dev/sdb6.
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
           Name : lamachine:128  (local to host lamachine)
  Creation Time : Fri Oct 24 15:24:38 2014
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 4294705152 (2047.88 GiB 2198.89 GB)
     Array Size : 4294705152 (4095.75 GiB 4397.78 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 1f652d4f:92fccd8e:b439abf2:76b881e1

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Feb  4 18:55:34 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ed602b13 - correct
         Events : 4154

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
           Name : lamachine:129  (local to host lamachine)
  Creation Time : Mon Nov 10 16:28:11 2014
     Raid Level : raid0
   Raid Devices : 3

 Avail Dev Size : 1048313856 (499.88 GiB 536.74 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 562dd382:5ccc00aa:449ea7e4:d8b266c2

    Update Time : Mon Nov 10 16:28:11 2014
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 937158c1 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdf:
   MBR Magic : aa55
Partition[0] :       407552 sectors at         2048 (type 83)
Partition[1] :     61440000 sectors at       409663 (type fd)
Partition[2] :    512000000 sectors at     61849663 (type fd)
Partition[3] :    402918402 sectors at    573849663 (type 05)
mdadm: No md superblock detected on /dev/sdf1.
/dev/sdf2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
  Creation Time : Thu Dec  3 22:12:12 2009
     Raid Level : raid10
  Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
     Array Size : 30719936 (29.30 GiB 31.46 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126

    Update Time : Tue Sep 13 22:43:06 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ed981d84 - correct
         Events : 264152

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       82        0      active sync   /dev/sdf2

   0     0       8       82        0      active sync   /dev/sdf2
   1     1       8        1        1      active sync   /dev/sda1
/dev/sdf3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
  Creation Time : Mon Feb 11 07:54:36 2013
     Raid Level : raid5
  Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
     Array Size : 511999872 (488.28 GiB 524.29 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 2

    Update Time : Tue Sep 13 20:29:02 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 73b16db6 - correct
         Events : 611

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       83        0      active sync   /dev/sdf3

   0     0       8       83        0      active sync   /dev/sdf3
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8        2        2      active sync   /dev/sda2
/dev/sdf4:
   MBR Magic : aa55
Partition[0] :     62918679 sectors at           63 (type 83)
Partition[1] :      7116795 sectors at     82453782 (type 05)
/dev/sdf5:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5
           Name : reading.homeunix.com:3
  Creation Time : Tue Jul 26 19:00:28 2011
     Raid Level : raid0
   Raid Devices : 3

 Avail Dev Size : 62916631 (30.00 GiB 32.21 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=0 sectors
          State : clean
    Device UUID : 5778cd64:0bbba183:ef3270a8:41f83aca

    Update Time : Tue Jul 26 19:00:28 2011
       Checksum : 96003cba - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
mdadm: No md superblock detected on /dev/sdf6.
[root@lamachine ~]#

^ permalink raw reply

* Re: Inactive arrays
From: Wols Lists @ 2016-09-13 21:26 UTC (permalink / raw)
  To: Daniel Sanabria, Chris Murphy; +Cc: Linux-RAID
In-Reply-To: <CAHscji3LX3+c_ighYMMDQo1E4ygdepr=Gt2wVmWzdLj+F+tVsQ@mail.gmail.com>

On 13/09/16 21:36, Daniel Sanabria wrote:
> [root@lamachine ~]# mdadm -D /dev/sd*
> mdadm: /dev/sda does not appear to be an md device
> mdadm: /dev/sda1 does not appear to be an md device
> mdadm: /dev/sda2 does not appear to be an md device
> mdadm: /dev/sda3 does not appear to be an md device
> mdadm: /dev/sda5 does not appear to be an md device
> mdadm: /dev/sda6 does not appear to be an md device
> mdadm: /dev/sdb does not appear to be an md device

I think that it's been pointed out, but this should be "mdadm -E".

"mdadm -E" gets passed physical devices eg /dev/sda, "mdadm -D" gets
passed raid devices eg /dev/md127.

Confusing, I know ...

Cheers,
Wol

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-13 21:10 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID
In-Reply-To: <CAHscji3LX3+c_ighYMMDQo1E4ygdepr=Gt2wVmWzdLj+F+tVsQ@mail.gmail.com>

On Tue, Sep 13, 2016 at 2:36 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> Maybe start out with 'mdadm -D' on everything... literally everything,
>> every whole drive (i.e. /dev/sdd, /dev/sdc, all of them)

Sorry, mdadm -E on the individual drives and their partitions. I'm
curious what metadata it finds on each if any.





-- 
Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13 20:36 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAJCQCtT_4O8r-cJQaw7P-47ekawYUH7cuw-30vbB8J_J+Kf+WQ@mail.gmail.com>

> Maybe start out with 'mdadm -D' on everything... literally everything,
> every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also
> everyone of their partitions; and see if it's possible to sort out
> this mess.

[root@lamachine ~]# mdadm -D /dev/md*
/dev/md126:
        Version : 0.90
  Creation Time : Thu Dec  3 22:12:12 2009
     Raid Level : raid10
     Array Size : 30719936 (29.30 GiB 31.46 GB)
  Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126
    Persistence : Superblock is persistent

    Update Time : Tue Sep 13 21:33:13 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 64K

           UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
         Events : 0.264152

    Number   Major   Minor   RaidDevice State
       0       8       82        0      active sync set-A   /dev/sdf2
       1       8        1        1      active sync set-B   /dev/sda1
/dev/md127:
        Version : 1.2
  Creation Time : Tue Jul 26 19:00:28 2011
     Raid Level : raid0
     Array Size : 94367232 (90.00 GiB 96.63 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jul 26 19:00:28 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : reading.homeunix.com:3
           UUID : acd5374f:72628c93:6a906c4b:5f675ce5
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       85        0      active sync   /dev/sdf5
       1       8       21        1      active sync   /dev/sdb5
       2       8        5        2      active sync   /dev/sda5
/dev/md128:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 1
    Persistence : Superblock is persistent

          State : inactive

           Name : lamachine:128  (local to host lamachine)
           UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
         Events : 4154

    Number   Major   Minor   RaidDevice

       -       8       49        -        /dev/sdd1
/dev/md129:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 1
    Persistence : Superblock is persistent

          State : inactive

           Name : lamachine:129  (local to host lamachine)
           UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
         Events : 0

    Number   Major   Minor   RaidDevice

       -       8       50        -        /dev/sdd2
/dev/md2:
        Version : 0.90
  Creation Time : Mon Feb 11 07:54:36 2013
     Raid Level : raid5
     Array Size : 511999872 (488.28 GiB 524.29 GB)
  Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Tue Sep 13 20:29:02 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
         Events : 0.611

    Number   Major   Minor   RaidDevice State
       0       8       83        0      active sync   /dev/sdf3
       1       8       18        1      active sync   /dev/sdb2
       2       8        2        2      active sync   /dev/sda2
[root@lamachine ~]# mdadm -D /dev/sd*
mdadm: /dev/sda does not appear to be an md device
mdadm: /dev/sda1 does not appear to be an md device
mdadm: /dev/sda2 does not appear to be an md device
mdadm: /dev/sda3 does not appear to be an md device
mdadm: /dev/sda5 does not appear to be an md device
mdadm: /dev/sda6 does not appear to be an md device
mdadm: /dev/sdb does not appear to be an md device
mdadm: /dev/sdb2 does not appear to be an md device
mdadm: /dev/sdb3 does not appear to be an md device
mdadm: /dev/sdb4 does not appear to be an md device
mdadm: /dev/sdb5 does not appear to be an md device
mdadm: /dev/sdb6 does not appear to be an md device
mdadm: /dev/sdc does not appear to be an md device
mdadm: /dev/sdd does not appear to be an md device
mdadm: /dev/sdd1 does not appear to be an md device
mdadm: /dev/sdd2 does not appear to be an md device
mdadm: /dev/sde does not appear to be an md device
mdadm: /dev/sdf does not appear to be an md device
mdadm: /dev/sdf1 does not appear to be an md device
mdadm: /dev/sdf2 does not appear to be an md device
mdadm: /dev/sdf3 does not appear to be an md device
mdadm: /dev/sdf4 does not appear to be an md device
mdadm: /dev/sdf5 does not appear to be an md device
mdadm: /dev/sdf6 does not appear to be an md device
[root@lamachine ~]#

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13 20:29 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAJCQCtT_4O8r-cJQaw7P-47ekawYUH7cuw-30vbB8J_J+Kf+WQ@mail.gmail.com>

Thanks for the help Chris,

> Have you told us the entire story about how you got into
> this situation?

I think I have but I can see how it can be confusing since I have
provided non requested info - including old records from where arrays
were working (more on that below). Basically the system was moved
meaning it was offline for a few days, on first boot after the move I
ended up with md128 and md129 inactive

> Have you use 'mdadm create' trying to fix this? If you
> haven't, don't do it.

I haven't

> I see a lot of conflicting information. For example:
>
>> /dev/md129:
>>         Version : 1.2
>>   Creation Time : Mon Nov 10 16:28:11 2014
>>      Raid Level : raid0
>>      Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
>>    Raid Devices : 3
>>   Total Devices : 3
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Mon Nov 10 16:28:11 2014
>>           State : clean
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>      Chunk Size : 512K
>>
>>            Name : lamachine:129  (local to host lamachine)
>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       50        0      active sync   /dev/sdd2
>>        1       8       66        1      active sync   /dev/sde2
>>        2       8       82        2      active sync   /dev/sdf
>
>
>
>>> /dev/md129:
>>>         Version : 1.2
>>>      Raid Level : raid0
>>>   Total Devices : 1
>>>     Persistence : Superblock is persistent
>>>
>>>           State : inactive
>>>
>>>            Name : lamachine:129  (local to host lamachine)
>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>          Events : 0
>>>
>>>     Number   Major   Minor   RaidDevice
>>>
>>>        -       8       50        -        /dev/sdd2
>
>
> The same md device, one raid0 one raid5. The same sdd2, one in the
> raid0, and it's also in the raid5. Which is true?

So the first record for /dev/md129 is from the time the array was
working ok and the second is the current status. I think both records
shows Raid Level: raid0

> It sounds to me like
> you've tried recovery and did something wrong; or about as bad is
> you've had these drives in more than one software raid setup, and you
> didn't zero out old superblocks first.

The only thing that comes to mind is that at first the system wasn't
coming up because so I tried to boot from individual drives while
trying to locate the boot device.

> Maybe start out with 'mdadm -D' on everything... literally everything,
> every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also
> everyone of their partitions; and see if it's possible to sort out
> this mess.

Will run on devices "a to f"

On 13 September 2016 at 21:13, Chris Murphy <lists@colorremedies.com> wrote:
> An invalid backup GPT suggests it was stepped on by something that was
> used on the whole block device. The backup GPT is at the end of the
> drive. And if you were to use mdadm create on the entire drive rather
> than a partition, you'd step on that GPT and also incorrectly recreate
> the array. Have you told us the entire story about how you got into
> this situation? Have you use 'mdadm create' trying to fix this? If you
> haven't, don't do it.
>
> I see a lot of conflicting information. For example:
>
>> /dev/md129:
>>         Version : 1.2
>>   Creation Time : Mon Nov 10 16:28:11 2014
>>      Raid Level : raid0
>>      Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
>>    Raid Devices : 3
>>   Total Devices : 3
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Mon Nov 10 16:28:11 2014
>>           State : clean
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>      Chunk Size : 512K
>>
>>            Name : lamachine:129  (local to host lamachine)
>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       50        0      active sync   /dev/sdd2
>>        1       8       66        1      active sync   /dev/sde2
>>        2       8       82        2      active sync   /dev/sdf
>
>
>
>>> /dev/md129:
>>>         Version : 1.2
>>>      Raid Level : raid0
>>>   Total Devices : 1
>>>     Persistence : Superblock is persistent
>>>
>>>           State : inactive
>>>
>>>            Name : lamachine:129  (local to host lamachine)
>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>          Events : 0
>>>
>>>     Number   Major   Minor   RaidDevice
>>>
>>>        -       8       50        -        /dev/sdd2
>
>
> The same md device, one raid0 one raid5. The same sdd2, one in the
> raid0, and it's also in the raid5. Which is true? It sounds to me like
> you've tried recovery and did something wrong; or about as bad is
> you've had these drives in more than one software raid setup, and you
> didn't zero out old superblocks first. If you leave old signatures
> intact you end up with this sort of ambiguity, which signature is
> correct. So now you have to figure out which one is correct and which
> one is wrong...
>
> Maybe start out with 'mdadm -D' on everything... literally everything,
> every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also
> everyone of their partitions; and see if it's possible to sort out
> this mess.
>
>
> Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-13 20:13 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID
In-Reply-To: <CAHscji1vy55L_BgH=y4X=FMeEgHyxzckhAgLu=BojtTn_2mPug@mail.gmail.com>

An invalid backup GPT suggests it was stepped on by something that was
used on the whole block device. The backup GPT is at the end of the
drive. And if you were to use mdadm create on the entire drive rather
than a partition, you'd step on that GPT and also incorrectly recreate
the array. Have you told us the entire story about how you got into
this situation? Have you use 'mdadm create' trying to fix this? If you
haven't, don't do it.

I see a lot of conflicting information. For example:

> /dev/md129:
>         Version : 1.2
>   Creation Time : Mon Nov 10 16:28:11 2014
>      Raid Level : raid0
>      Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Nov 10 16:28:11 2014
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 512K
>
>            Name : lamachine:129  (local to host lamachine)
>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>          Events : 0
>
>     Number   Major   Minor   RaidDevice State
>        0       8       50        0      active sync   /dev/sdd2
>        1       8       66        1      active sync   /dev/sde2
>        2       8       82        2      active sync   /dev/sdf



>> /dev/md129:
>>         Version : 1.2
>>      Raid Level : raid0
>>   Total Devices : 1
>>     Persistence : Superblock is persistent
>>
>>           State : inactive
>>
>>            Name : lamachine:129  (local to host lamachine)
>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice
>>
>>        -       8       50        -        /dev/sdd2


The same md device, one raid0 one raid5. The same sdd2, one in the
raid0, and it's also in the raid5. Which is true? It sounds to me like
you've tried recovery and did something wrong; or about as bad is
you've had these drives in more than one software raid setup, and you
didn't zero out old superblocks first. If you leave old signatures
intact you end up with this sort of ambiguity, which signature is
correct. So now you have to figure out which one is correct and which
one is wrong...

Maybe start out with 'mdadm -D' on everything... literally everything,
every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also
everyone of their partitions; and see if it's possible to sort out
this mess.


Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13 20:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAJCQCtS-aYNgy7MM990CTKFqcNk8bf+ggzX3QRrD+yQZfdVxnA@mail.gmail.com>

> Yeah that looks like a recent boot; if that's a boot where you'd run
> parted and got those errors on read, then I don't have a good
> explanation why you're getting parted errors that don't have matching
> kernel messages, i.e. something from libata about the drive not liking
> the command or not properly reading from the drive, etc.

let me see if I can find something

> What do you get for gdisk -l <dev> for each of these drives?

[root@lamachine ~]# gdisk -l /dev/sdc
GPT fdisk (gdisk) version 1.0.1

Warning! Disk size is smaller than the main header indicates! Loading
secondary header from the last sector of the disk! You should use 'v' to
verify disk integrity, and perhaps options on the experts' menu to repair
the disk.
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sdc: 5860531055 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 6DB70F4E-D8ED-4290-AA2E-4E81D8324992
Partition table holds up to 128 entries
First usable sector is 2048, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 516987791 sectors (246.5 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      4294969343   2.0 TiB     FD00
   2      4294969344      5343545343   500.0 GiB   8300
[root@lamachine ~]# gdisk -l /dev/sdd
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdd: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): D3233810-F552-4126-8281-7F71A4938DF9
Partition table holds up to 128 entries
First usable sector is 2048, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 516987791 sectors (246.5 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      4294969343   2.0 TiB     FD00
   2      4294969344      5343545343   500.0 GiB   8300
[root@lamachine ~]# gdisk -l /dev/sde
GPT fdisk (gdisk) version 1.0.1

Warning! Disk size is smaller than the main header indicates! Loading
secondary header from the last sector of the disk! You should use 'v' to
verify disk integrity, and perhaps options on the experts' menu to repair
the disk.
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sde: 5860531055 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): B64DAA7C-E1D8-4E8A-A5C8-76001DAE6B30
Partition table holds up to 128 entries
First usable sector is 2048, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 516987791 sectors (246.5 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      4294969343   2.0 TiB     FD00
   2      4294969344      5343545343   500.0 GiB   8300
[root@lamachine ~]#

On 13 September 2016 at 20:52, Chris Murphy <lists@colorremedies.com> wrote:
> On Tue, Sep 13, 2016 at 1:43 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>> This is a problem. What do you get for
>>>
>>> cat /sys/block/sdc/device/timeout
>>
>> [root@lamachine ~]# cat /sys/block/sdc/device/timeout
>> 30
>> [root@lamachine ~]# cat /sys/block/sdd/device/timeout
>> 30
>> [root@lamachine ~]# cat /sys/block/sde/device/timeout
>> 30
>> [root@lamachine ~]#
>
> Common and often fatal misconfiguration. Since the drives don't
> support SCT ERC, the command timer needs to be changed to something
> higher. Without the benefit of historical kernel messages, it's
> unclear if there have been any link resets that'd indicate improper
> correction for bad sectors on the drives.
>
>
>
>
>>
>>> Anyone specifically familiar with WDC Greens, and if the lack of SCT
>>> ERC can be worked around in the usual way by increasing the SCSI
>>> command timer value? Or is there also something else? I vaguely recall
>>> something about drive spin down that can also cause issues, does that
>>> need mitigation? If no one chimes in, this information is in the
>>> archives, just search for 'WDC green' and you'll get an shittonne of
>>> results.
>>
>> In another thread I found Phil Turmel recommending to change the
>> timeout value like this:
>>
>> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
>>
>> Is that what you guys are talking about when mentioning the SCT/ERC issues?
>
> Yes. You should do that.
>
>
>
>
>>
>>> OK so the next thing I want to see is why you're getting these
>>> messages from parted when you check sdc and sde for partition maps. At
>>> the time you do this, what do you see in kernel messages? Maybe best
>>> to just stick the entire dmesg for the current boot up somewhere like
>>> fpaste.org or equivalent.
>>
>> https://paste.fedoraproject.org/427719/37952531/
>
> Yeah that looks like a recent boot; if that's a boot where you'd run
> parted and got those errors on read, then I don't have a good
> explanation why you're getting parted errors that don't have matching
> kernel messages, i.e. something from libata about the drive not liking
> the command or not properly reading from the drive, etc.
>
> What do you get for gdisk -l <dev> for each of these drives?
>
>
>
> --
> Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-13 19:52 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID
In-Reply-To: <CAHscji0H7kEDzhg3KpUYrdk5MA06k-F5Xe95=EL7mo-_mS3y5A@mail.gmail.com>

On Tue, Sep 13, 2016 at 1:43 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> This is a problem. What do you get for
>>
>> cat /sys/block/sdc/device/timeout
>
> [root@lamachine ~]# cat /sys/block/sdc/device/timeout
> 30
> [root@lamachine ~]# cat /sys/block/sdd/device/timeout
> 30
> [root@lamachine ~]# cat /sys/block/sde/device/timeout
> 30
> [root@lamachine ~]#

Common and often fatal misconfiguration. Since the drives don't
support SCT ERC, the command timer needs to be changed to something
higher. Without the benefit of historical kernel messages, it's
unclear if there have been any link resets that'd indicate improper
correction for bad sectors on the drives.




>
>> Anyone specifically familiar with WDC Greens, and if the lack of SCT
>> ERC can be worked around in the usual way by increasing the SCSI
>> command timer value? Or is there also something else? I vaguely recall
>> something about drive spin down that can also cause issues, does that
>> need mitigation? If no one chimes in, this information is in the
>> archives, just search for 'WDC green' and you'll get an shittonne of
>> results.
>
> In another thread I found Phil Turmel recommending to change the
> timeout value like this:
>
> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
>
> Is that what you guys are talking about when mentioning the SCT/ERC issues?

Yes. You should do that.




>
>> OK so the next thing I want to see is why you're getting these
>> messages from parted when you check sdc and sde for partition maps. At
>> the time you do this, what do you see in kernel messages? Maybe best
>> to just stick the entire dmesg for the current boot up somewhere like
>> fpaste.org or equivalent.
>
> https://paste.fedoraproject.org/427719/37952531/

Yeah that looks like a recent boot; if that's a boot where you'd run
parted and got those errors on read, then I don't have a good
explanation why you're getting parted errors that don't have matching
kernel messages, i.e. something from libata about the drive not liking
the command or not properly reading from the drive, etc.

What do you get for gdisk -l <dev> for each of these drives?



-- 
Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13 19:43 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAJCQCtTzPpt0krPxCZY_TB-Xa6PcG0YsdDPnneq5dEB30wy8hw@mail.gmail.com>

> This is a problem. What do you get for
>
> cat /sys/block/sdc/device/timeout

[root@lamachine ~]# cat /sys/block/sdc/device/timeout
30
[root@lamachine ~]# cat /sys/block/sdd/device/timeout
30
[root@lamachine ~]# cat /sys/block/sde/device/timeout
30
[root@lamachine ~]#

> Anyone specifically familiar with WDC Greens, and if the lack of SCT
> ERC can be worked around in the usual way by increasing the SCSI
> command timer value? Or is there also something else? I vaguely recall
> something about drive spin down that can also cause issues, does that
> need mitigation? If no one chimes in, this information is in the
> archives, just search for 'WDC green' and you'll get an shittonne of
> results.

In another thread I found Phil Turmel recommending to change the
timeout value like this:

for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done

Is that what you guys are talking about when mentioning the SCT/ERC issues?

> OK so the next thing I want to see is why you're getting these
> messages from parted when you check sdc and sde for partition maps. At
> the time you do this, what do you see in kernel messages? Maybe best
> to just stick the entire dmesg for the current boot up somewhere like
> fpaste.org or equivalent.

https://paste.fedoraproject.org/427719/37952531/

^ permalink raw reply

* Re: RAID6 - CPU At 100% Usage After Reassembly
From: Francisco Parada @ 2016-09-13 18:15 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Michael J. Shaver, mdraid
In-Reply-To: <20160913174352.GA43576@kernel.org>

Hi Shaohua,

>If you could rebuild kernel and apply this debug patch

It would be my pleasure to do this.  Thank you for your reply!
Do you have a decent reference site where I could read through and do
this from?  I've been to a ton of pages in the past, but none that are
concise enough to follow a rebuild and apply a patch like you ask?  If
not, no worries, I'll just do a bit of digging and reading.  I am
working from home today though, so if I could bang this one out, it
would be amazing!

>please capture the /sys/kernel/debug/tracing/trace output and send to me.

No problem, once I do this I will send that right over.


On Tue, Sep 13, 2016 at 1:43 PM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Sep 13, 2016 at 11:22:17AM -0400, Francisco Parada wrote:
>> Hi all,
>>
>> I've tried the suggestions mentioned and also tried booting into an
>> older version of Ubuntu (14.04.5 instead of my current 16.04.1), which
>> has an older kernel per a private suggestion, but that too was
>> unsuccessful.  Just wanted to give a quick update that all of the
>> mentioned workarounds were unfruitful.
>
> If you could rebuild kernel and apply this debug patch, I'd like to check what
> happens. When the cpu is in 100% usage and reshape stalls, please capture the
> /sys/kernel/debug/tracing/trace output and send to me.
>
> Thanks,
> Shaohua
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5883ef0..db484ca 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -62,6 +62,9 @@
>  #include "raid0.h"
>  #include "bitmap.h"
>
> +#undef pr_debug
> +#define pr_debug trace_printk
> +
>  #define cpu_to_group(cpu) cpu_to_node(cpu)
>  #define ANY_GROUP NUMA_NO_NODE
>

^ permalink raw reply

* Re: RAID6 - CPU At 100% Usage After Reassembly
From: Shaohua Li @ 2016-09-13 17:43 UTC (permalink / raw)
  To: Francisco Parada; +Cc: Michael J. Shaver, mdraid
In-Reply-To: <CAOW94utnY1zCR7gUCx1K0p2YfDhi3tXTsE6J_mT6rNz1TuTZEQ@mail.gmail.com>

On Tue, Sep 13, 2016 at 11:22:17AM -0400, Francisco Parada wrote:
> Hi all,
> 
> I've tried the suggestions mentioned and also tried booting into an
> older version of Ubuntu (14.04.5 instead of my current 16.04.1), which
> has an older kernel per a private suggestion, but that too was
> unsuccessful.  Just wanted to give a quick update that all of the
> mentioned workarounds were unfruitful.

If you could rebuild kernel and apply this debug patch, I'd like to check what
happens. When the cpu is in 100% usage and reshape stalls, please capture the
/sys/kernel/debug/tracing/trace output and send to me.

Thanks,
Shaohua

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5883ef0..db484ca 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -62,6 +62,9 @@
 #include "raid0.h"
 #include "bitmap.h"
 
+#undef pr_debug
+#define pr_debug trace_printk
+
 #define cpu_to_group(cpu) cpu_to_node(cpu)
 #define ANY_GROUP NUMA_NO_NODE
 

^ permalink raw reply related

* Re: Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").
From: Shaohua Li @ 2016-09-13 17:24 UTC (permalink / raw)
  To: Christophe JAILLET; +Cc: linux-raid, linux-kernel, gqjiang
In-Reply-To: <752ab1d9-412a-149b-a241-e604040ebaff@wanadoo.fr>

On Mon, Sep 12, 2016 at 09:09:48PM +0200, Christophe JAILLET wrote:
> Hi,
> 
> I'm puzzled by commit f9a67b1182e5 ("md/bitmap: clear bitmap if
> bitmap_create failed").
Hi Christophe,
Thank you very much to help check this!

> Part of the commit is:
> 
> @@ -1865,8 +1866,10 @@ int bitmap_copy_from_slot(struct mddev *mddev, int
> slot,
>      struct bitmap_counts *counts;
>      struct bitmap *bitmap = bitmap_create(mddev, slot);
> 
> -    if (IS_ERR(bitmap))
> +    if (IS_ERR(bitmap)) {
> +        bitmap_free(bitmap);
>          return PTR_ERR(bitmap);
> +    }
> 
> but if 'bitmap' is an error, I think that bad things will happen in
> 'bitmap_free()' when, at the beginning of the function, we will execute:
> 
>     if (bitmap->sysfs_can_clear) <-----------------
>         sysfs_put(bitmap->sysfs_can_clear);

Add Guoqing.

Yeah, you are right, this bitmap_free isn't required. This must be something
slip in in the v2 patch. I'll delete that line.

> However, the commit log message is really explicit and adding this call to
> 'bitmap_free' has really been done one purpose. ("If bitmap_create returns
> an error, we need to call either bitmap_destroy or bitmap_free to do clean
> up, ...")

this log is a little confusing, I thought it really means the bitmap_free called
in bitmap_create. The V1 patch calls bitmap_destroy in bitmap_create.

Thanks,
Shaohua

^ permalink raw reply

* [GIT PULL] MD update for 4.8-rc6
From: Shaohua Li @ 2016-09-13 16:56 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, linux-raid, neilb

Hi Linus,
A few bug fixes for MD.
- Guoqing fixed a bug compiling md-cluster in kernel
- I fixed a potential deadlock in raid5-cache superblock write, a hang in raid5
  reshape resume and a race condition introduced in -rc4

Please pull!

Thanks,
Shaohua

The following changes since commit 86a1679860babbacd61fc1e8c0c0f43641d5860d:

  Merge tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md (2016-08-30 11:24:04 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.8-rc6

for you to fetch changes up to c94455558337eece474eebb6a16b905f98930418:

  raid5: fix a small race condition (2016-09-09 11:09:19 -0700)

----------------------------------------------------------------
Guoqing Jiang (1):
      md-cluster: make md-cluster also can work when compiled into kernel

Shaohua Li (3):
      raid5-cache: fix a deadlock in superblock write
      raid5: guarantee enough stripes to avoid reshape hang
      raid5: fix a small race condition

 drivers/md/md.c          | 12 ++++--------
 drivers/md/raid5-cache.c | 46 +++++++++++++++-------------------------------
 drivers/md/raid5.c       | 14 ++++++++++++--
 3 files changed, 31 insertions(+), 41 deletions(-)

^ permalink raw reply

* Re: RAID6 - CPU At 100% Usage After Reassembly
From: Francisco Parada @ 2016-09-13 15:22 UTC (permalink / raw)
  To: Michael J. Shaver; +Cc: mdraid
In-Reply-To: <CAOW94ute6CZafnAC6AVauqUGsEgVrnO3B27Gn3eVzdWFBJPCQw@mail.gmail.com>

Hi all,

I've tried the suggestions mentioned and also tried booting into an
older version of Ubuntu (14.04.5 instead of my current 16.04.1), which
has an older kernel per a private suggestion, but that too was
unsuccessful.  Just wanted to give a quick update that all of the
mentioned workarounds were unfruitful.

Does anyone else have any suggestions or workarounds?

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-13 15:20 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAHscji3KfFfO8CxZTbLX6u-bfA57pOR5i6qvc0NhNODH5tGeEg@mail.gmail.com>

On Tue, Sep 13, 2016 at 12:56 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote:

>
> [root@lamachine ~]# smartctl -x /dev/sdc
> smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
> Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Green
> Device Model:     WDC WD30EZRX-00D8PB0
> Serial Number:    WD-WCC4NCWT13RF

> SCT Error Recovery Control command not supported

This is a problem. What do you get for

cat /sys/block/sdc/device/timeout


>
> [root@lamachine ~]# smartctl -x /dev/sdd
> smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
> Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Green
> Device Model:     WDC WD30EZRX-00D8PB0
> Serial Number:    WD-WCC4NPRDD6D7

>
> SCT Error Recovery Control command not supported

Same for sdd.


> [root@lamachine ~]# smartctl -x /dev/sde
> smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
> Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Green
> Device Model:     WDC WD30EZRX-00D8PB0
> Serial Number:    WD-WCC4N1294906


> SCT Error Recovery Control command not supported

And sde.

But otherwise no other issues.

Note that WDC Greens are explicitly disqualified for use in RAID of
any level by the manufacturer. We can complain til the cows come home
but the reality is the manufacturer does not stand behind this product
at all in RAID configurations.

Anyone specifically familiar with WDC Greens, and if the lack of SCT
ERC can be worked around in the usual way by increasing the SCSI
command timer value? Or is there also something else? I vaguely recall
something about drive spin down that can also cause issues, does that
need mitigation? If no one chimes in, this information is in the
archives, just search for 'WDC green' and you'll get an shittonne of
results.

OK so the next thing I want to see is why you're getting these
messages from parted when you check sdc and sde for partition maps. At
the time you do this, what do you see in kernel messages? Maybe best
to just stick the entire dmesg for the current boot up somewhere like
fpaste.org or equivalent.



-- 
Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-13 15:04 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID
In-Reply-To: <CAHscji2pAt1_yPpXj6gKE-sin0Db3RAo56srM8cye5ub4rSLXg@mail.gmail.com>

On Tue, Sep 13, 2016 at 12:51 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> What version of parted?
>
> parted (GNU parted) 3.2

That should reliably fix the GPT, assuming it's appropriate to fix it
- which isn't necessarily a good assumption. The backup GPT is located
in about the same place as the mdadm v2 metadata. So one can step on
the other, depending on how the array is created (on whole devices or
on partitions). The problem with unwinding this is if any of these
structures becomes stale without having the signature wiped, it
becomes important to find out for certain which one is valid and which
one is stale.

>
>> Are these two drives on the same
>> controller?
>
> yes afaict

I'd check all the cables (the connections in particular) drive to
controller and that the controller is seated. Seems sorta unlikely
you'd have the same kind of problem on two drives at the same time
that's a hardware problem unless they share something the other drives
don't; or if some event has modified both drives the same way. Thing
is, invalid argument during seek for read? Kinda sounds like a
hardware problem.

-- 
Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Adam Goryachev @ 2016-09-13  7:02 UTC (permalink / raw)
  To: Daniel Sanabria, Wols Lists; +Cc: Linux-RAID
In-Reply-To: <CAHscji3KfFfO8CxZTbLX6u-bfA57pOR5i6qvc0NhNODH5tGeEg@mail.gmail.com>

On 13/09/16 16:56, Daniel Sanabria wrote:
>> What does smartctl report on sdc and sde (I think you want smartctl -x,
>> the extended "display everything" command)?
> [root@lamachine ~]# smartctl -x /dev/{sdc,sdd,sde}
> smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
> Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
>
> ERROR: smartctl takes ONE device name as the final command-line argument.
> You have provided 3 device names:
> /dev/sdc
> /dev/sdd
> /dev/sde
>
> Use smartctl -h to get a usage summary
>
> [root@lamachine ~]# smartctl -x /dev/sdc
> smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
> Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Green

Make sure you have read about linux raid and SCT/ERC and fixed the 
timeout issue for these drives. Do that immediately, before you bother 
trying to recover anything, or do anything else (or else recovery will 
probably fail, things will get worse, the sky will fall in...)

> Device Model:     WDC WD30EZRX-00D8PB0
> Serial Number:    WD-WCC4NCWT13RF
> LU WWN Device Id: 5 0014ee 25fc9e460
> Firmware Version: 80.00A80
> User Capacity:    3,000,591,900,160 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Sep 13 07:53:18 2016 BST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> AAM feature is:   Unavailable
> APM feature is:   Unavailable
> Rd look-ahead is: Enabled
> Write cache is:   Enabled
> ATA Security is:  Disabled, NOT FROZEN [SEC1]
> Wt Cache Reorder: Enabled
>
> 193 Load_Cycle_Count        -O--CK   142   142   000    -    175500
I recall there is some tool or setting for the drive that will stop it 
from "parking" every 30 seconds, you should read up on that and see if 
you can prevent this. This will slow the drive down every time it needs 
to "restart" the drive to read/write after a short period of inactivity.

Once you fix those two issues, (which might be related to the cause of 
your problem), then someone else with more detailed knowledge can advise 
on the best next step.

Regards,
Adam

-- 
-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13  6:56 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux-RAID
In-Reply-To: <57D72092.20704@youngman.org.uk>

> What does smartctl report on sdc and sde (I think you want smartctl -x,
> the extended "display everything" command)?

[root@lamachine ~]# smartctl -x /dev/{sdc,sdd,sde}
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl takes ONE device name as the final command-line argument.
You have provided 3 device names:
/dev/sdc
/dev/sdd
/dev/sde

Use smartctl -h to get a usage summary

[root@lamachine ~]# smartctl -x /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00D8PB0
Serial Number:    WD-WCC4NCWT13RF
LU WWN Device Id: 5 0014ee 25fc9e460
Firmware Version: 80.00A80
User Capacity:    3,000,591,900,160 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Sep 13 07:53:18 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (38940) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (   2) minutes.
Extended self-test routine
recommended polling time: ( 391) minutes.
Conveyance self-test routine
recommended polling time: (   5) minutes.
SCT capabilities:       (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   177   177   021    -    6116
  4 Start_Stop_Count        -O--CK   100   100   000    -    41
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   085   085   000    -    11128
 10 Spin_Retry_Count        -O--CK   100   253   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    41
192 Power-Off_Retract_Count -O--CK   200   200   000    -    20
193 Load_Cycle_Count        -O--CK   142   142   000    -    175500
194 Temperature_Celsius     -O---K   123   114   000    -    27
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    27 Celsius
Power Cycle Min/Max Temperature:     23/27 Celsius
Lifetime    Min/Max Temperature:     15/36 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (376)

Index    Estimated Time   Temperature Celsius
 377    2016-09-12 23:56     ?  -
 378    2016-09-12 23:57    23  ****
 379    2016-09-12 23:58    24  *****
 ...    ..(  4 skipped).    ..  *****
 384    2016-09-13 00:03    24  *****
 385    2016-09-13 00:04    25  ******
 386    2016-09-13 00:05    25  ******
 387    2016-09-13 00:06    25  ******
 388    2016-09-13 00:07    26  *******
 ...    ..(  2 skipped).    ..  *******
 391    2016-09-13 00:10    26  *******
 392    2016-09-13 00:11    27  ********
 ...    ..(  7 skipped).    ..  ********
 400    2016-09-13 00:19    27  ********
 401    2016-09-13 00:20    29  **********
 402    2016-09-13 00:21    28  *********
 ...    ..(  4 skipped).    ..  *********
 407    2016-09-13 00:26    28  *********
 408    2016-09-13 00:27    29  **********
 ...    ..(  4 skipped).    ..  **********
 413    2016-09-13 00:32    29  **********
 414    2016-09-13 00:33    30  ***********
 ...    ..(  5 skipped).    ..  ***********
 420    2016-09-13 00:39    30  ***********
 421    2016-09-13 00:40    31  ************
 ...    ..( 23 skipped).    ..  ************
 445    2016-09-13 01:04    31  ************
 446    2016-09-13 01:05    32  *************
 ...    ..( 24 skipped).    ..  *************
 471    2016-09-13 01:30    32  *************
 472    2016-09-13 01:31     ?  -
 473    2016-09-13 01:32    33  **************
 474    2016-09-13 01:33    32  *************
 ...    ..(  9 skipped).    ..  *************
   6    2016-09-13 01:43    32  *************
   7    2016-09-13 01:44    33  **************
 ...    ..( 20 skipped).    ..  **************
  28    2016-09-13 02:05    33  **************
  29    2016-09-13 02:06    32  *************
  30    2016-09-13 02:07    32  *************
  31    2016-09-13 02:08    32  *************
  32    2016-09-13 02:09    33  **************
  33    2016-09-13 02:10    32  *************
 ...    ..( 62 skipped).    ..  *************
  96    2016-09-13 03:13    32  *************
  97    2016-09-13 03:14     ?  -
  98    2016-09-13 03:15    24  *****
 ...    ..(  4 skipped).    ..  *****
 103    2016-09-13 03:20    24  *****
 104    2016-09-13 03:21    25  ******
 105    2016-09-13 03:22    25  ******
 106    2016-09-13 03:23    25  ******
 107    2016-09-13 03:24    26  *******
 ...    ..(  2 skipped).    ..  *******
 110    2016-09-13 03:27    26  *******
 111    2016-09-13 03:28    27  ********
 ...    ..(  6 skipped).    ..  ********
 118    2016-09-13 03:35    27  ********
 119    2016-09-13 03:36    28  *********
 ...    ..( 11 skipped).    ..  *********
 131    2016-09-13 03:48    28  *********
 132    2016-09-13 03:49    29  **********
 ...    ..(  8 skipped).    ..  **********
 141    2016-09-13 03:58    29  **********
 142    2016-09-13 03:59    30  ***********
 ...    ..( 11 skipped).    ..  ***********
 154    2016-09-13 04:11    30  ***********
 155    2016-09-13 04:12    31  ************
 ...    ..( 42 skipped).    ..  ************
 198    2016-09-13 04:55    31  ************
 199    2016-09-13 04:56     ?  -
 200    2016-09-13 04:57    22  ***
 201    2016-09-13 04:58    22  ***
 202    2016-09-13 04:59    23  ****
 203    2016-09-13 05:00    23  ****
 204    2016-09-13 05:01    24  *****
 205    2016-09-13 05:02    24  *****
 206    2016-09-13 05:03    25  ******
 ...    ..(  3 skipped).    ..  ******
 210    2016-09-13 05:07    25  ******
 211    2016-09-13 05:08    26  *******
 212    2016-09-13 05:09    27  ********
 ...    ..(  7 skipped).    ..  ********
 220    2016-09-13 05:17    27  ********
 221    2016-09-13 05:18    28  *********
 ...    ..( 13 skipped).    ..  *********
 235    2016-09-13 05:32    28  *********
 236    2016-09-13 05:33    29  **********
 ...    ..( 10 skipped).    ..  **********
 247    2016-09-13 05:44    29  **********
 248    2016-09-13 05:45    30  ***********
 ...    ..( 16 skipped).    ..  ***********
 265    2016-09-13 06:02    30  ***********
 266    2016-09-13 06:03    31  ************
 ...    ..( 96 skipped).    ..  ************
 363    2016-09-13 07:40    31  ************
 364    2016-09-13 07:41     ?  -
 365    2016-09-13 07:42    31  ************
 366    2016-09-13 07:43    31  ************
 367    2016-09-13 07:44    30  ***********
 368    2016-09-13 07:45    30  ***********
 369    2016-09-13 07:46    30  ***********
 370    2016-09-13 07:47    31  ************
 ...    ..(  5 skipped).    ..  ************
 376    2016-09-13 07:53    31  ************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         1280  Vendor specific

[root@lamachine ~]# smartctl -x /dev/sdd
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00D8PB0
Serial Number:    WD-WCC4NPRDD6D7
LU WWN Device Id: 5 0014ee 25fca27b1
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Sep 13 07:53:20 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (39060) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (   2) minutes.
Extended self-test routine
recommended polling time: ( 392) minutes.
Conveyance self-test routine
recommended polling time: (   5) minutes.
SCT capabilities:       (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   176   176   021    -    6166
  4 Start_Stop_Count        -O--CK   100   100   000    -    41
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   085   085   000    -    11129
 10 Spin_Retry_Count        -O--CK   100   253   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    41
192 Power-Off_Retract_Count -O--CK   200   200   000    -    27
193 Load_Cycle_Count        -O--CK   137   137   000    -    191240
194 Temperature_Celsius     -O---K   123   114   000    -    27
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    27 Celsius
Power Cycle Min/Max Temperature:     23/27 Celsius
Lifetime    Min/Max Temperature:     15/36 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (401)

Index    Estimated Time   Temperature Celsius
 402    2016-09-12 23:56     ?  -
 403    2016-09-12 23:57    23  ****
 404    2016-09-12 23:58    23  ****
 405    2016-09-12 23:59    24  *****
 ...    ..(  3 skipped).    ..  *****
 409    2016-09-13 00:03    24  *****
 410    2016-09-13 00:04    25  ******
 411    2016-09-13 00:05    25  ******
 412    2016-09-13 00:06    25  ******
 413    2016-09-13 00:07    26  *******
 ...    ..(  2 skipped).    ..  *******
 416    2016-09-13 00:10    26  *******
 417    2016-09-13 00:11    27  ********
 ...    ..(  7 skipped).    ..  ********
 425    2016-09-13 00:19    27  ********
 426    2016-09-13 00:20    28  *********
 427    2016-09-13 00:21    29  **********
 ...    ..( 10 skipped).    ..  **********
 438    2016-09-13 00:32    29  **********
 439    2016-09-13 00:33    30  ***********
 ...    ..(  4 skipped).    ..  ***********
 444    2016-09-13 00:38    30  ***********
 445    2016-09-13 00:39    31  ************
 ...    ..( 23 skipped).    ..  ************
 469    2016-09-13 01:03    31  ************
 470    2016-09-13 01:04    32  *************
 ...    ..( 26 skipped).    ..  *************
  19    2016-09-13 01:31    32  *************
  20    2016-09-13 01:32     ?  -
  21    2016-09-13 01:33    33  **************
  22    2016-09-13 01:34    32  *************
 ...    ..( 11 skipped).    ..  *************
  34    2016-09-13 01:46    32  *************
  35    2016-09-13 01:47    33  **************
 ...    ..( 11 skipped).    ..  **************
  47    2016-09-13 01:59    33  **************
  48    2016-09-13 02:00    32  *************
 ...    ..( 73 skipped).    ..  *************
 122    2016-09-13 03:14    32  *************
 123    2016-09-13 03:15     ?  -
 124    2016-09-13 03:16    23  ****
 125    2016-09-13 03:17    24  *****
 ...    ..(  3 skipped).    ..  *****
 129    2016-09-13 03:21    24  *****
 130    2016-09-13 03:22    25  ******
 131    2016-09-13 03:23    25  ******
 132    2016-09-13 03:24    25  ******
 133    2016-09-13 03:25    26  *******
 134    2016-09-13 03:26    26  *******
 135    2016-09-13 03:27    26  *******
 136    2016-09-13 03:28    27  ********
 ...    ..(  7 skipped).    ..  ********
 144    2016-09-13 03:36    27  ********
 145    2016-09-13 03:37    28  *********
 ...    ..( 11 skipped).    ..  *********
 157    2016-09-13 03:49    28  *********
 158    2016-09-13 03:50    29  **********
 ...    ..(  9 skipped).    ..  **********
 168    2016-09-13 04:00    29  **********
 169    2016-09-13 04:01    30  ***********
 ...    ..( 12 skipped).    ..  ***********
 182    2016-09-13 04:14    30  ***********
 183    2016-09-13 04:15    31  ************
 ...    ..( 40 skipped).    ..  ************
 224    2016-09-13 04:56    31  ************
 225    2016-09-13 04:57     ?  -
 226    2016-09-13 04:58    22  ***
 227    2016-09-13 04:59    22  ***
 228    2016-09-13 05:00    23  ****
 229    2016-09-13 05:01    23  ****
 230    2016-09-13 05:02    24  *****
 231    2016-09-13 05:03    24  *****
 232    2016-09-13 05:04    25  ******
 ...    ..(  3 skipped).    ..  ******
 236    2016-09-13 05:08    25  ******
 237    2016-09-13 05:09    26  *******
 238    2016-09-13 05:10    27  ********
 ...    ..(  6 skipped).    ..  ********
 245    2016-09-13 05:17    27  ********
 246    2016-09-13 05:18    28  *********
 ...    ..( 13 skipped).    ..  *********
 260    2016-09-13 05:32    28  *********
 261    2016-09-13 05:33    29  **********
 ...    ..( 11 skipped).    ..  **********
 273    2016-09-13 05:45    29  **********
 274    2016-09-13 05:46    30  ***********
 ...    ..( 20 skipped).    ..  ***********
 295    2016-09-13 06:07    30  ***********
 296    2016-09-13 06:08    31  ************
 ...    ..( 92 skipped).    ..  ************
 389    2016-09-13 07:41    31  ************
 390    2016-09-13 07:42     ?  -
 391    2016-09-13 07:43    31  ************
 ...    ..(  9 skipped).    ..  ************
 401    2016-09-13 07:53    31  ************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            3  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         1283  Vendor specific

[root@lamachine ~]# smartctl -x /dev/sde
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00D8PB0
Serial Number:    WD-WCC4N1294906
LU WWN Device Id: 5 0014ee 25f968120
Firmware Version: 80.00A80
User Capacity:    3,000,591,900,160 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Sep 13 07:53:23 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (43200) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (   2) minutes.
Extended self-test routine
recommended polling time: ( 433) minutes.
Conveyance self-test routine
recommended polling time: (   5) minutes.
SCT capabilities:       (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   175   175   021    -    6208
  4 Start_Stop_Count        -O--CK   100   100   000    -    40
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   085   085   000    -    11141
 10 Spin_Retry_Count        -O--CK   100   253   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    40
192 Power-Off_Retract_Count -O--CK   200   200   000    -    27
193 Load_Cycle_Count        -O--CK   143   143   000    -    172837
194 Temperature_Celsius     -O---K   123   113   000    -    27
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    27 Celsius
Power Cycle Min/Max Temperature:     23/27 Celsius
Lifetime    Min/Max Temperature:     16/37 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (145)

Index    Estimated Time   Temperature Celsius
 146    2016-09-12 23:56     ?  -
 147    2016-09-12 23:57    23  ****
 148    2016-09-12 23:58    23  ****
 149    2016-09-12 23:59    24  *****
 ...    ..(  3 skipped).    ..  *****
 153    2016-09-13 00:03    24  *****
 154    2016-09-13 00:04    25  ******
 155    2016-09-13 00:05    25  ******
 156    2016-09-13 00:06    25  ******
 157    2016-09-13 00:07    26  *******
 158    2016-09-13 00:08    26  *******
 159    2016-09-13 00:09    26  *******
 160    2016-09-13 00:10    27  ********
 ...    ..(  8 skipped).    ..  ********
 169    2016-09-13 00:19    27  ********
 170    2016-09-13 00:20    28  *********
 ...    ..(  6 skipped).    ..  *********
 177    2016-09-13 00:27    28  *********
 178    2016-09-13 00:28    29  **********
 ...    ..(  5 skipped).    ..  **********
 184    2016-09-13 00:34    29  **********
 185    2016-09-13 00:35    30  ***********
 ...    ..(  5 skipped).    ..  ***********
 191    2016-09-13 00:41    30  ***********
 192    2016-09-13 00:42    31  ************
 ...    ..( 25 skipped).    ..  ************
 218    2016-09-13 01:08    31  ************
 219    2016-09-13 01:09    32  *************
 ...    ..( 20 skipped).    ..  *************
 240    2016-09-13 01:30    32  *************
 241    2016-09-13 01:31     ?  -
 242    2016-09-13 01:32    33  **************
 243    2016-09-13 01:33    32  *************
 ...    ..( 35 skipped).    ..  *************
 279    2016-09-13 02:09    32  *************
 280    2016-09-13 02:10    33  **************
 ...    ..( 63 skipped).    ..  **************
 344    2016-09-13 03:14    33  **************
 345    2016-09-13 03:15     ?  -
 346    2016-09-13 03:16    23  ****
 347    2016-09-13 03:17    24  *****
 ...    ..(  3 skipped).    ..  *****
 351    2016-09-13 03:21    24  *****
 352    2016-09-13 03:22    25  ******
 353    2016-09-13 03:23    25  ******
 354    2016-09-13 03:24    25  ******
 355    2016-09-13 03:25    26  *******
 356    2016-09-13 03:26    26  *******
 357    2016-09-13 03:27    26  *******
 358    2016-09-13 03:28    27  ********
 ...    ..(  5 skipped).    ..  ********
 364    2016-09-13 03:34    27  ********
 365    2016-09-13 03:35    28  *********
 ...    ..(  9 skipped).    ..  *********
 375    2016-09-13 03:45    28  *********
 376    2016-09-13 03:46    29  **********
 ...    ..(  5 skipped).    ..  **********
 382    2016-09-13 03:52    29  **********
 383    2016-09-13 03:53    30  ***********
 ...    ..(  7 skipped).    ..  ***********
 391    2016-09-13 04:01    30  ***********
 392    2016-09-13 04:02    31  ************
 ...    ..( 28 skipped).    ..  ************
 421    2016-09-13 04:31    31  ************
 422    2016-09-13 04:32    32  *************
 ...    ..( 23 skipped).    ..  *************
 446    2016-09-13 04:56    32  *************
 447    2016-09-13 04:57     ?  -
 448    2016-09-13 04:58    22  ***
 449    2016-09-13 04:59    22  ***
 450    2016-09-13 05:00    23  ****
 451    2016-09-13 05:01    23  ****
 452    2016-09-13 05:02    24  *****
 453    2016-09-13 05:03    24  *****
 454    2016-09-13 05:04    25  ******
 ...    ..(  3 skipped).    ..  ******
 458    2016-09-13 05:08    25  ******
 459    2016-09-13 05:09    26  *******
 460    2016-09-13 05:10    27  ********
 461    2016-09-13 05:11    28  *********
 ...    ..( 12 skipped).    ..  *********
 474    2016-09-13 05:24    28  *********
 475    2016-09-13 05:25    29  **********
 ...    ..(  6 skipped).    ..  **********
   4    2016-09-13 05:32    29  **********
   5    2016-09-13 05:33    30  ***********
 ...    ..(  8 skipped).    ..  ***********
  14    2016-09-13 05:42    30  ***********
  15    2016-09-13 05:43    31  ************
 ...    ..( 41 skipped).    ..  ************
  57    2016-09-13 06:25    31  ************
  58    2016-09-13 06:26    32  *************
 ...    ..( 74 skipped).    ..  *************
 133    2016-09-13 07:41    32  *************
 134    2016-09-13 07:42     ?  -
 135    2016-09-13 07:43    32  *************
 ...    ..(  9 skipped).    ..  *************
 145    2016-09-13 07:53    32  *************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            3  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         1285  Vendor specific

[root@lamachine ~]#

> And looking at the lsdrv stuff, were you using LVM? That's got a load of
> references to LV.

yes I was using LVM on top of raid

Thanks for the help so far guys!

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-13  6:51 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAJCQCtRQmyvCqCqvmKazPg7cFrZ9A5SzZ+Yts_B3SxHU5s68GQ@mail.gmail.com>

> What version of parted?

parted (GNU parted) 3.2

> Are these two drives on the same
> controller?

yes afaict

^ permalink raw reply

* Re: Inactive arrays
From: Wols Lists @ 2016-09-12 21:39 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: linux-raid
In-Reply-To: <CAHscji2k9FyrOs+96XQWEUWU3W0EWxkanaHakjsGhij0hXiVGw@mail.gmail.com>

On 12/09/16 22:13, Daniel Sanabria wrote:
> apologies for the verbosity just adding some more info which is now
> making me lose hope. Using parted -l instead of fdisk gives me this:

Hmmm...

I'd wait and let some of the experts in disk recovery chime in before it
gets that far. The fact that fdisk found the partitions on sdc and sde
is a hopeful sign. And providing something hasn't scribbled all over the
disk, several of them know their way around partition tables and can
recreate them if they're not totally scrambled.

I think using fdisk on the drives was a mistake, but seeing as -l
doesn't write anything it was a harmless mistake. iirc fdisk can't
handle gpt disks, or stuff over 2TB, so that's where that problem lay.
Try using gdisk instead of parted, that might behave just that little
bit differently, and so long as it doesn't write anything, it won't do
any harm.

What does smartctl report on sdc and sde (I think you want smartctl -x,
the extended "display everything" command)?

And looking at the lsdrv stuff, were you using LVM? That's got a load of
references to LV. The snag is, I'm now getting a bit out of my depth - I
tend to "first respond" and ask for the information that I know the
experts will want. If Phil or someone else now takes a look at this
thread it'll contain all the information they need, but I think we need
to wait for them to chime in now. So hopefully they'll see this tonight
and be on hand soon... I'll come back if I think of anything new ...

Cheers,
Wol

^ permalink raw reply

* Re: Inactive arrays
From: Chris Murphy @ 2016-09-12 21:37 UTC (permalink / raw)
  To: Daniel Sanabria; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <CAHscji2k9FyrOs+96XQWEUWU3W0EWxkanaHakjsGhij0hXiVGw@mail.gmail.com>

On Mon, Sep 12, 2016 at 3:13 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote:

> Error: Invalid argument during seek for read on /dev/sdc
> Retry/Ignore/Cancel? R
> Error: Invalid argument during seek for read on /dev/sdc
> Retry/Ignore/Cancel? I
> Error: The backup GPT table is corrupt, but the primary appears OK, so
> that will be used.
> OK/Cancel? O
> Model: ATA WDC WD30EZRX-00D (scsi)
> Disk /dev/sdc: 3001GB
> Sector size (logical/physical): 512B/4096B
> Partition Table: unknown
> Disk Flags:

What version of parted?

This looks like a problem due to the error, followed by "primary
appears OK" but instead of using that like it says, it reports the
Partition Table as unknown. Not expected.



> Error: Invalid argument during seek for read on /dev/sde
> Retry/Ignore/Cancel? C
> Model: ATA WDC WD30EZRX-00D (scsi)
> Disk /dev/sde: 3001GB
> Sector size (logical/physical): 512B/4096B
> Partition Table: unknown
> Disk Flags:


And the same thing with a second drive? What are the chances? Can you
post a complete dmesg somewhere? Are these two drives on the same
controller?


smartctl -x for each drive?


-- 
Chris Murphy

^ permalink raw reply

* Re: Inactive arrays
From: Daniel Sanabria @ 2016-09-12 21:13 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid
In-Reply-To: <CAHscji35kySP4Q8cUpYCXR2P9DBwPcYeWjr_HX==TNX9CPW9NA@mail.gmail.com>

apologies for the verbosity just adding some more info which is now
making me lose hope. Using parted -l instead of fdisk gives me this:

[root@lamachine ~]# parted -l
Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system  Flags
 1      32.3kB  31.5GB  31.5GB  primary                raid
 2      31.5GB  294GB   262GB   primary   ext4         raid
 3      294GB   500GB   207GB   extended
 5      294GB   326GB   32.2GB  logical
 6      336GB   339GB   3644MB  logical                raid


Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start  End    Size    Type      File system     Flags
 2      210MB  262GB  262GB   primary                   raid
 3      262GB  271GB  8389MB  primary   linux-swap(v1)
 4      271GB  500GB  229GB   extended
 5      271GB  303GB  32.2GB  logical
 6      313GB  317GB  3644MB  logical                   raid


Error: Invalid argument during seek for read on /dev/sdc
Retry/Ignore/Cancel? R
Error: Invalid argument during seek for read on /dev/sdc
Retry/Ignore/Cancel? I
Error: The backup GPT table is corrupt, but the primary appears OK, so
that will be used.
OK/Cancel? O
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdc: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:

Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sdd: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2199GB  2199GB                     raid
 2      2199GB  2736GB  537GB


Error: Invalid argument during seek for read on /dev/sde
Retry/Ignore/Cancel? C
Model: ATA WDC WD30EZRX-00D (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:

Model: ATA WDC WD5000AAKS-0 (scsi)
Disk /dev/sdf: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system  Flags
 1      1049kB  210MB   209MB   primary   ext4         boot
 2      210MB   31.7GB  31.5GB  primary                raid
 3      31.7GB  294GB   262GB   primary   ext4         raid
 4      294GB   500GB   206GB   extended
 5      294GB   326GB   32.2GB  logical
 6      336GB   340GB   3644MB  logical                raid


Model: Linux Software RAID Array (md)
Disk /dev/md2: 524GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End    Size   File system  Flags
 1      0.00B  524GB  524GB  ext4


Error: /dev/md126: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md126: 31.5GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Error: /dev/md127: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md127: 96.6GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:



On 12 September 2016 at 20:41, Daniel Sanabria <sanabria.d@gmail.com> wrote:
> ok, I just adjusted system time so that I can start tracking logs.
>
> what I'm noticing however is that fdisk -l is not giving me the expect
> partitions (I was expecting at least 2 partitions in every 2.7 disk
> similar to what I have in sdd):
>
> [root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde}
> Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
>
> Device     Boot Start        End    Sectors Size Id Type
> /dev/sdc1           1 4294967295 4294967295   2T ee GPT
>
> Partition 1 does not start on physical sector boundary.
> Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: gpt
> Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9
>
> Device          Start        End    Sectors  Size Type
> /dev/sdd1        2048 4294969343 4294967296    2T Linux RAID
> /dev/sdd2  4294969344 5343545343 1048576000  500G Linux filesystem
> Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
>
> Device     Boot Start        End    Sectors Size Id Type
> /dev/sde1           1 4294967295 4294967295   2T ee GPT
>
> Partition 1 does not start on physical sector boundary.
> [root@lamachine lamachine_220315]#
>
> what could've happened here? any ideas why the partition tables ended
> up like that?
>
> From previous information I have an idea of what the md128 and md129
> are supposed to looks like (also noticed that the device names
> changed):
>
> # md128 and md129 details From an old command output
> /dev/md128:
>         Version : 1.2
>   Creation Time : Fri Oct 24 15:24:38 2014
>      Raid Level : raid5
>      Array Size : 4294705152 (4095.75 GiB 4397.78 GB)
>   Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Sun Mar 22 06:20:08 2015
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : lamachine:128  (local to host lamachine)
>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>          Events : 4041
>
>     Number   Major   Minor   RaidDevice State
>        0       8       49        0      active sync   /dev/sdd1
>        1       8       65        1      active sync   /dev/sde1
>        3       8       81        2      active sync   /dev/sdf1
> /dev/md129:
>         Version : 1.2
>   Creation Time : Mon Nov 10 16:28:11 2014
>      Raid Level : raid0
>      Array Size : 1572470784 (1499.63 GiB 1610.21 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Nov 10 16:28:11 2014
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 512K
>
>            Name : lamachine:129  (local to host lamachine)
>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>          Events : 0
>
>     Number   Major   Minor   RaidDevice State
>        0       8       50        0      active sync   /dev/sdd2
>        1       8       66        1      active sync   /dev/sde2
>        2       8       82        2      active sync   /dev/sdf2
>
> Is there any way to recover the contents of these two arrays ? :(
>
> On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>> However I'm noticing that the details with this new MB are somewhat different:
>>
>> [root@lamachine ~]# cat /etc/mdadm.conf
>> # mdadm.conf written out by anaconda
>> MAILADDR root
>> AUTO +imsm +1.x -all
>> ARRAY /dev/md2 level=raid5 num-devices=3
>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>> ARRAY /dev/md126 level=raid10 num-devices=2
>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>> ARRAY /dev/md127 level=raid0 num-devices=3
>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>> [root@lamachine ~]# mdadm --detail /dev/md1*
>> /dev/md126:
>>         Version : 0.90
>>   Creation Time : Thu Dec  3 22:12:12 2009
>>      Raid Level : raid10
>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>    Raid Devices : 2
>>   Total Devices : 2
>> Preferred Minor : 126
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Tue Jan 12 04:03:41 2016
>>           State : clean
>>  Active Devices : 2
>> Working Devices : 2
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : near=2
>>      Chunk Size : 64K
>>
>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>          Events : 0.264152
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       82        0      active sync set-A   /dev/sdf2
>>        1       8        1        1      active sync set-B   /dev/sda1
>> /dev/md127:
>>         Version : 1.2
>>   Creation Time : Tue Jul 26 19:00:28 2011
>>      Raid Level : raid0
>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>    Raid Devices : 3
>>   Total Devices : 3
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Tue Jul 26 19:00:28 2011
>>           State : clean
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>      Chunk Size : 512K
>>
>>            Name : reading.homeunix.com:3
>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       85        0      active sync   /dev/sdf5
>>        1       8       21        1      active sync   /dev/sdb5
>>        2       8        5        2      active sync   /dev/sda5
>> /dev/md128:
>>         Version : 1.2
>>      Raid Level : raid0
>>   Total Devices : 1
>>     Persistence : Superblock is persistent
>>
>>           State : inactive
>>
>>            Name : lamachine:128  (local to host lamachine)
>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>          Events : 4154
>>
>>     Number   Major   Minor   RaidDevice
>>
>>        -       8       49        -        /dev/sdd1
>> /dev/md129:
>>         Version : 1.2
>>      Raid Level : raid0
>>   Total Devices : 1
>>     Persistence : Superblock is persistent
>>
>>           State : inactive
>>
>>            Name : lamachine:129  (local to host lamachine)
>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>          Events : 0
>>
>>     Number   Major   Minor   RaidDevice
>>
>>        -       8       50        -        /dev/sdd2
>> [root@lamachine ~]# mdadm --detail /dev/md2*
>> /dev/md2:
>>         Version : 0.90
>>   Creation Time : Mon Feb 11 07:54:36 2013
>>      Raid Level : raid5
>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>    Raid Devices : 3
>>   Total Devices : 3
>> Preferred Minor : 2
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Tue Jan 12 02:31:50 2016
>>           State : clean
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>          Events : 0.611
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       83        0      active sync   /dev/sdf3
>>        1       8       18        1      active sync   /dev/sdb2
>>        2       8        2        2      active sync   /dev/sda2
>> [root@lamachine ~]# cat /proc/mdstat
>> Personalities : [raid10] [raid0] [raid6] [raid5] [raid4]
>> md2 : active raid5 sda2[2] sdf3[0] sdb2[1]
>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>
>> md127 : active raid0 sda5[2] sdf5[0] sdb5[1]
>>       94367232 blocks super 1.2 512k chunks
>>
>> md129 : inactive sdd2[2](S)
>>       524156928 blocks super 1.2
>>
>> md128 : inactive sdd1[3](S)
>>       2147352576 blocks super 1.2
>>
>> md126 : active raid10 sdf2[0] sda1[1]
>>       30719936 blocks 2 near-copies [2/2] [UU]
>>
>> unused devices: <none>
>> [root@lamachine ~]#
>>
>> On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>> ok, system up and running after MB was replaced however the arrays
>>> remain inactive.
>>>
>>> mdadm version is:
>>> mdadm - v3.3.4 - 3rd August 2015
>>>
>>> Here's the output from Phil's lsdrv:
>>>
>>> [root@lamachine ~]# ./lsdrv
>>> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series
>>> chipset 6-Port SATA AHCI Controller (rev 06)
>>> ├scsi 0:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASZ0505379}
>>> │└sda 465.76g [8:0] Partitioned (dos)
>>> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync
>>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>>> │ │ │                    PV LVM2_member 28.03g used, 1.26g free
>>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>>> │ │ └VG vg_bigblackbox 29.29g 1.26g free
>>> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR}
>>> │ │  ├dm-2 7.81g [253:2] LV LogVol_opt ext4
>>> {b08d7f5e-f15f-4241-804e-edccecab6003}
>>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt
>>> │ │  ├dm-0 9.77g [253:0] LV LogVol_root ext4
>>> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3}
>>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ /
>>> │ │  ├dm-3 1.95g [253:3] LV LogVol_tmp ext4
>>> {f6b46363-170b-4038-83bd-2c5f9f6a1973}
>>> │ │  │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp
>>> │ │  └dm-1 8.50g [253:1] LV LogVol_var ext4
>>> {ab165c61-3d62-4c55-8639-6c2c2bf4b021}
>>> │ │   └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var
>>> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │ │                 ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ │ └Mounted as /dev/md2 @ /home
>>> │ ├sda3 1.00k [8:3] Partitioned (dos)
>>> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │ │                    PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB}
>>> │ │  ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos)
>>> │ │  ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos)
>>> │ │  ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos)
>>> │ │  ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos)
>>> │ │  ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos)
>>> │ │  ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos)
>>> │ │  └dm-4 30.00g [253:4] LV win7 Partitioned (dos)
>>> │ └sda6 3.39g [8:6] Empty/Unknown
>>> ├scsi 1:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WCASY7694185}
>>> │└sdb 465.76g [8:16] Partitioned (dos)
>>> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a}
>>> │ ├sdb4 1.00k [8:20] Partitioned (dos)
>>> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │                      PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ └sdb6 3.39g [8:22] Empty/Unknown
>>> ├scsi 2:x:x:x [Empty]
>>> ├scsi 3:x:x:x [Empty]
>>> ├scsi 4:x:x:x [Empty]
>>> └scsi 5:x:x:x [Empty]
>>> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd.
>>> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
>>> ├scsi 6:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NCWT13RF}
>>> │└sdc 2.73t [8:32] Partitioned (PMBR)
>>> ├scsi 7:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4NPRDD6D7}
>>> │└sdd 2.73t [8:48] Partitioned (gpt)
>>> │ ├sdd1 2.00t [8:49] MD  (none/) spare 'lamachine:128'
>>> {f2372cb9-d381-6fd6-ce86-d826882ec82e}
>>> │ │└md128 0.00k [9:128] MD v1.2  () inactive, None (None) None
>>> {f2372cb9:d3816fd6:ce86d826:882ec82e}
>>> │ │                     Empty/Unknown
>>> │ └sdd2 500.00g [8:50] MD  (none/) spare 'lamachine:129'
>>> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a}
>>> │  └md129 0.00k [9:129] MD v1.2  () inactive, None (None) None
>>> {895dae98:d1a496de:4f590b8b:cb8ac12a}
>>> │                       Empty/Unknown
>>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N1294906}
>>> │└sde 2.73t [8:64] Partitioned (PMBR)
>>> ├scsi 9:0:0:0 ATA      WDC WD5000AAKS-0 {WD-WMAWF0085724}
>>> │└sdf 465.76g [8:80] Partitioned (dos)
>>> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557}
>>> │ │└Mounted as /dev/sdf1 @ /boot
>>> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync
>>> {9af006ca-8845-bbd3-bfe7-8010bc810f04}
>>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk
>>> {9af006ca:8845bbd3:bfe78010:bc810f04}
>>> │ │                      PV LVM2_member 28.03g used, 1.26g free
>>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH}
>>> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync
>>> {2cff15d1-e411-447b-fd5d-472103e44022}
>>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk
>>> {2cff15d1:e411447b:fd5d4721:03e44022}
>>> │ │                   ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311}
>>> │ ├sdf4 1.00k [8:84] Partitioned (dos)
>>> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync
>>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5}
>>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None
>>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5}
>>> │ │                      PV LVM2_member 86.00g used, 3.99g free
>>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox}
>>> │ └sdf6 3.39g [8:86] Empty/Unknown
>>> ├scsi 10:x:x:x [Empty]
>>> ├scsi 11:x:x:x [Empty]
>>> └scsi 12:x:x:x [Empty]
>>> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation
>>> C602 chipset 4-Port SATA Storage Control Unit (rev 06)
>>> └scsi 14:x:x:x [Empty]
>>> [root@lamachine ~]#
>>>
>>> Thanks in advance for any recommendations on what steps to take in
>>> order to bring these arrays back online.
>>>
>>> Regards,
>>>
>>> Daniel
>>>
>>>
>>> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote:
>>>> Thanks very much for the response Wol.
>>>>
>>>> It looks like the PSU is dead (server automatically powers off a few
>>>> seconds after power on).
>>>>
>>>> I'm planning to order a PSU replacement to resume troubleshooting so
>>>> please bear with me;  maybe the PSU was degraded and couldn't power
>>>> some of drives?
>>>>
>>>> Cheers,
>>>>
>>>> Daniel
>>>>
>>>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote:
>>>>> Just a quick first response. I see md128 and md129 are both down, and
>>>>> are both listed as one drive, raid0. Bit odd, that ...
>>>>>
>>>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?)
>>>>> that would split an array in two. Is it possible that you should have
>>>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup...
>>>>>
>>>>> I notice also that md126 is raid10 across two drives. That's odd, too.
>>>>>
>>>>> How much do you know about what the setup should be, and why it was set
>>>>> up that way?
>>>>>
>>>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is
>>>>> python3 a quick fix to the shebang at the start should get it to work).
>>>>> Post the output from that here.
>>>>>
>>>>> Cheers,
>>>>> Wol
>>>>>
>>>>> On 02/08/16 08:36, Daniel Sanabria wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I have a box that I believe was not powered down correctly and after
>>>>>> transporting it to a different location it doesn't boot anymore
>>>>>> stopping at BIOS check "Verifying DMI Pool Data".
>>>>>>
>>>>>> The box have 6 drives and after instructing the BIOS to boot from the
>>>>>> first drive I managed to boot the OS (Fedora 23) after commenting out
>>>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows:
>>>>>>
>>>>>> [root@lamachine ~]# uname -a; cat /etc/fstab
>>>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC
>>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>
>>>>>> #
>>>>>> # /etc/fstab
>>>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015
>>>>>> #
>>>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk'
>>>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
>>>>>> #
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_root /                       ext4
>>>>>> defaults        1 1
>>>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot                   ext4
>>>>>>    defaults        1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt                    ext4
>>>>>> defaults        1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp                    ext4
>>>>>> defaults        1 2
>>>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var                    ext4
>>>>>> defaults        1 2
>>>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap                    swap
>>>>>>    defaults        0 0
>>>>>> /dev/md2 /home          ext4    defaults        1 2
>>>>>> #/dev/vg_media/lv_media  /mnt/media      ext4    defaults        1 2
>>>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2
>>>>>> [root@lamachine ~]#
>>>>>>
>>>>>> When checking mdstat I can see that 2 of the arrays are showing up as
>>>>>> inactive, but not sure how to safely activate these so looking for
>>>>>> some knowledgeable advice on how to proceed here.
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> Below some more relevant outputs:
>>>>>>
>>>>>> [root@lamachine ~]# cat /proc/mdstat
>>>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0]
>>>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
>>>>>>       94367232 blocks super 1.2 512k chunks
>>>>>>
>>>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1]
>>>>>>       511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>>>>
>>>>>> md128 : inactive sdf1[3](S)
>>>>>>       2147352576 blocks super 1.2
>>>>>>
>>>>>> md129 : inactive sdf2[2](S)
>>>>>>       524156928 blocks super 1.2
>>>>>>
>>>>>> md126 : active raid10 sda2[0] sdc1[1]
>>>>>>       30719936 blocks 2 near-copies [2/2] [UU]
>>>>>>
>>>>>> unused devices: <none>
>>>>>> [root@lamachine ~]# cat /etc/mdadm.conf
>>>>>> # mdadm.conf written out by anaconda
>>>>>> MAILADDR root
>>>>>> AUTO +imsm +1.x -all
>>>>>> ARRAY /dev/md2 level=raid5 num-devices=3
>>>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022
>>>>>> ARRAY /dev/md126 level=raid10 num-devices=2
>>>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>> ARRAY /dev/md127 level=raid0 num-devices=3
>>>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128
>>>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129
>>>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>> [root@lamachine ~]# mdadm --detail /dev/md1*
>>>>>> /dev/md126:
>>>>>>         Version : 0.90
>>>>>>   Creation Time : Thu Dec  3 22:12:12 2009
>>>>>>      Raid Level : raid10
>>>>>>      Array Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>>   Used Dev Size : 30719936 (29.30 GiB 31.46 GB)
>>>>>>    Raid Devices : 2
>>>>>>   Total Devices : 2
>>>>>> Preferred Minor : 126
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Tue Aug  2 07:46:39 2016
>>>>>>           State : clean
>>>>>>  Active Devices : 2
>>>>>> Working Devices : 2
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>          Layout : near=2
>>>>>>      Chunk Size : 64K
>>>>>>
>>>>>>            UUID : 9af006ca:8845bbd3:bfe78010:bc810f04
>>>>>>          Events : 0.264152
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8        2        0      active sync set-A   /dev/sda2
>>>>>>        1       8       33        1      active sync set-B   /dev/sdc1
>>>>>> /dev/md127:
>>>>>>         Version : 1.2
>>>>>>   Creation Time : Tue Jul 26 19:00:28 2011
>>>>>>      Raid Level : raid0
>>>>>>      Array Size : 94367232 (90.00 GiB 96.63 GB)
>>>>>>    Raid Devices : 3
>>>>>>   Total Devices : 3
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Tue Jul 26 19:00:28 2011
>>>>>>           State : clean
>>>>>>  Active Devices : 3
>>>>>> Working Devices : 3
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>      Chunk Size : 512K
>>>>>>
>>>>>>            Name : reading.homeunix.com:3
>>>>>>            UUID : acd5374f:72628c93:6a906c4b:5f675ce5
>>>>>>          Events : 0
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8        5        0      active sync   /dev/sda5
>>>>>>        1       8       21        1      active sync   /dev/sdb5
>>>>>>        2       8       37        2      active sync   /dev/sdc5
>>>>>> /dev/md128:
>>>>>>         Version : 1.2
>>>>>>      Raid Level : raid0
>>>>>>   Total Devices : 1
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>           State : inactive
>>>>>>
>>>>>>            Name : lamachine:128  (local to host lamachine)
>>>>>>            UUID : f2372cb9:d3816fd6:ce86d826:882ec82e
>>>>>>          Events : 4154
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice
>>>>>>
>>>>>>        -       8       81        -        /dev/sdf1
>>>>>> /dev/md129:
>>>>>>         Version : 1.2
>>>>>>      Raid Level : raid0
>>>>>>   Total Devices : 1
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>           State : inactive
>>>>>>
>>>>>>            Name : lamachine:129  (local to host lamachine)
>>>>>>            UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a
>>>>>>          Events : 0
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice
>>>>>>
>>>>>>        -       8       82        -        /dev/sdf2
>>>>>> [root@lamachine ~]# mdadm --detail /dev/md2
>>>>>> /dev/md2:
>>>>>>         Version : 0.90
>>>>>>   Creation Time : Mon Feb 11 07:54:36 2013
>>>>>>      Raid Level : raid5
>>>>>>      Array Size : 511999872 (488.28 GiB 524.29 GB)
>>>>>>   Used Dev Size : 255999936 (244.14 GiB 262.14 GB)
>>>>>>    Raid Devices : 3
>>>>>>   Total Devices : 3
>>>>>> Preferred Minor : 2
>>>>>>     Persistence : Superblock is persistent
>>>>>>
>>>>>>     Update Time : Mon Aug  1 20:24:23 2016
>>>>>>           State : clean
>>>>>>  Active Devices : 3
>>>>>> Working Devices : 3
>>>>>>  Failed Devices : 0
>>>>>>   Spare Devices : 0
>>>>>>
>>>>>>          Layout : left-symmetric
>>>>>>      Chunk Size : 64K
>>>>>>
>>>>>>            UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine)
>>>>>>          Events : 0.611
>>>>>>
>>>>>>     Number   Major   Minor   RaidDevice State
>>>>>>        0       8        3        0      active sync   /dev/sda3
>>>>>>        1       8       18        1      active sync   /dev/sdb2
>>>>>>        2       8       34        2      active sync   /dev/sdc2
>>>>>> [root@lamachine ~]#
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox