linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid6 issues
@ 2011-06-16 20:28 Chad Walker
  2011-06-18 19:48 ` Chad Walker
  0 siblings, 1 reply; 28+ messages in thread
From: Chad Walker @ 2011-06-16 20:28 UTC (permalink / raw)
  To: linux-raid

I have 15 drives in a raid6 plus a spare. I returned home after being
gone for 12 days and one of the drives was marked as faulty. The load
on the machine was crazy, and mdadm stop responding. I should've done
an strace, sorry. Likewise cat'ing /proc/mdstat was blocking. I
rebooted and mdadm started recovering, but to the faulty drive. I
checked in on /proc/mdstat periodically over the 35-hour recovery.
When it was down to the last bit, /proc/mdstat and mdadm stopped
responding again. I gave it 28 hours, and then when I still couldn't
get any insight into it I rebooted again. Now /proc/mdstat says it's
inactive. And I don't appear to be able to assemble it. I issued
--examine on each of the 16 drives and they all agreed with each other
except for the faulty drive. I popped the faulty drive out and
rebooted again, still no luck assembling.

This is what my /proc/mdstat looks like:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md1 : inactive sdd1[12](S) sdm1[6](S) sdf1[0](S) sdh1[2](S) sdi1[7](S)
sdb1[14](S) sdo1[4](S) sdg1[1](S) sdl1[8](S) sdk1[9](S) sdc1[13](S)
sdn1[3](S) sdj1[10](S) sdp1[15](S) sde1[11](S)
      29302715520 blocks

unused devices: <none>

This is what the --examine for /dev/sd[b-o]1 and /dev/sdq1 look like:
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 78e3f473:48bbfc34:0e051622:5c30970b
  Creation Time : Wed Mar 30 14:48:46 2011
     Raid Level : raid6
  Used Dev Size : 1953514368 (1863.02 GiB 2000.40 GB)
     Array Size : 25395686784 (24219.21 GiB 26005.18 GB)
   Raid Devices : 15
  Total Devices : 16
Preferred Minor : 1

    Update Time : Wed Jun 15 07:45:12 2011
          State : active
 Active Devices : 14
Working Devices : 15
 Failed Devices : 1
  Spare Devices : 1
       Checksum : e4ff038f - correct
         Events : 38452

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    14       8       17       14      active sync   /dev/sdb1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       97        1      active sync   /dev/sdg1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      209        3      active sync   /dev/sdn1
   4     4       8      225        4      active sync   /dev/sdo1
   5     5       0        0        5      faulty removed
   6     6       8      193        6      active sync   /dev/sdm1
   7     7       8      129        7      active sync   /dev/sdi1
   8     8       8      177        8      active sync   /dev/sdl1
   9     9       8      161        9      active sync   /dev/sdk1
  10    10       8      145       10      active sync   /dev/sdj1
  11    11       8       65       11      active sync   /dev/sde1
  12    12       8       49       12      active sync   /dev/sdd1
  13    13       8       33       13      active sync   /dev/sdc1
  14    14       8       17       14      active sync   /dev/sdb1
  15    15      65        1       15      spare   /dev/sdq1

And this is what --examine for /dev/sdp1 looked like:
/dev/sdp1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 78e3f473:48bbfc34:0e051622:5c30970b
  Creation Time : Wed Mar 30 14:48:46 2011
     Raid Level : raid6
  Used Dev Size : 1953514368 (1863.02 GiB 2000.40 GB)
     Array Size : 25395686784 (24219.21 GiB 26005.18 GB)
   Raid Devices : 15
  Total Devices : 16
Preferred Minor : 1

    Update Time : Tue Jun 14 07:35:56 2011
          State : active
 Active Devices : 15
Working Devices : 16
 Failed Devices : 0
  Spare Devices : 1
       Checksum : e4fdb07b - correct
         Events : 38433

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8      241        5      active sync   /dev/sdp1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       97        1      active sync   /dev/sdg1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      209        3      active sync   /dev/sdn1
   4     4       8      225        4      active sync   /dev/sdo1
   5     5       8      241        5      active sync   /dev/sdp1
   6     6       8      193        6      active sync   /dev/sdm1
   7     7       8      129        7      active sync   /dev/sdi1
   8     8       8      177        8      active sync   /dev/sdl1
   9     9       8      161        9      active sync   /dev/sdk1
  10    10       8      145       10      active sync   /dev/sdj1
  11    11       8       65       11      active sync   /dev/sde1
  12    12       8       49       12      active sync   /dev/sdd1
  13    13       8       33       13      active sync   /dev/sdc1
  14    14       8       17       14      active sync   /dev/sdb1
  15    15      65        1       15      spare   /dev/sdq1

I was scared to run mdadm --build --level=6 --raid-devices=15 /dev/md1
/dev/sdf1 /dev/sdg1....

system information:
Ubuntu 11.04, kernel 2.6.38, x86_64, mdadm version 3.1.4, 3ware 9650SE

Any advice? There's about 1TB of data on these drives that would cause
my wife to kill me (and about 9TB of data would just irritate her to
loose).

-chad

^ permalink raw reply	[flat|nested] 28+ messages in thread
* RAID6 issues
@ 2011-09-13  6:14 Andriano
  2011-09-13  6:25 ` NeilBrown
  2011-09-27 18:46 ` Thomas Fjellstrom
  0 siblings, 2 replies; 28+ messages in thread
From: Andriano @ 2011-09-13  6:14 UTC (permalink / raw)
  To: linux-raid

Hello Linux-RAID mailing list,

I have an issue with my RAID6 array.
Here goes a short description of the system:

opensuse 11.4
Linux 3.0.4-2-desktop #1 SMP PREEMPT Wed Aug 31 09:30:44 UTC 2011
(a432f18) x86_64 x86_64 x86_64 GNU/Linux
Gigabyte EP35C-DS3 motherboard with 8 SATA ports + SuperMicro
AOC-SASLP-MV8 based on Marvel 6480, firmware updated to 3.1.0.21
running mdadm 3.2.2, single array consists of 10 2T disks, 8 of them
connected to the HBA, 2 - motherboard ports

I had some issues with one of the onboard connected disks, so tried to
plug it to different ports, just to eliminate possibly faulty port.
After reboot, suddenly other drives got kicked out from the array.
Re-assembling them gives weird errors.

--- some output ---
[3:0:0:0]    disk    ATA      ST2000DL003-9VT1 CC32  /dev/sdb
[5:0:0:0]    disk    ATA      ST2000DL003-9VT1 CC32  /dev/sdc
[8:0:0:0]    disk    ATA      ST32000542AS     CC34  /dev/sdd
[8:0:1:0]    disk    ATA      ST32000542AS     CC34  /dev/sde
[8:0:2:0]    disk    ATA      ST32000542AS     CC34  /dev/sdf
[8:0:3:0]    disk    ATA      ST32000542AS     CC34  /dev/sdg
[8:0:4:0]    disk    ATA      ST32000542AS     CC34  /dev/sdh
[8:0:5:0]    disk    ATA      ST2000DL003-9VT1 CC32  /dev/sdi
[8:0:6:0]    disk    ATA      ST2000DL003-9VT1 CC32  /dev/sdj
[8:0:7:0]    disk    ATA      ST2000DL003-9VT1 CC32  /dev/sdk

#more /etc/mdadm.conf
DEVICE partitions
ARRAY /dev/md0 level=raid6 UUID=82ac7386:a854194d:81b795d1:76c9c9ff

#mdadm --assemble --force --scan /dev/md0
mdadm: failed to add /dev/sdc to /dev/md0: Invalid argument
mdadm: failed to add /dev/sdb to /dev/md0: Invalid argument
mdadm: failed to add /dev/sdh to /dev/md0: Invalid argument
mdadm: /dev/md0 assembled from 7 drives - not enough to start the array.

dmesg:
[ 8215.651860] md: sdc does not have a valid v1.2 superblock, not importing!
[ 8215.651865] md: md_import_device returned -22
[ 8215.652384] md: sdb does not have a valid v1.2 superblock, not importing!
[ 8215.652388] md: md_import_device returned -22
[ 8215.653177] md: sdh does not have a valid v1.2 superblock, not importing!
[ 8215.653182] md: md_import_device returned -22

mdadm -E /dev/sd[b..k] gives exactly the same Magic number and Array
UUID for every disk, all checksums are correct,
the only difference is -  Avail Dev Size : 3907028896 is the same for
9 disks, and 3907028864 for sdc

mdadm --assemble --force --update summaries /dev/sd.. - didn't improve anything


I would really appreciate if someone could point me to the right direction.

thanks

Andrew

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: RAID6 issues
@ 2011-09-13 14:24 NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2011-09-13 14:24 UTC (permalink / raw)
  To: Andriano, Roman Mamedov; +Cc: linux-raid

(stupid android mail client insists on top-posting - sorry)
No.  You cannot (easily) get that device to be an active member of
the array again, and it almost certainly wouldn't help anyway.

It would only help if the data you want is on the device, and the
parity blocks that are being used to recreate it are corrupt.
I think it very unlikely that they are corrupt but the data isn't.

The problem seems to be that the journal superblock is bad.  That seems
to suggest that much of the rest of the filesystem is OK.
I would suggest you "fsck -n -f" the device and see how much it wants
to 'fix'.  If it is just a few things, I would just let fsck fix it up for you.

If there are pages and pages of errors - then you have bigger problems.

NeilBrown


Andriano <chief000@gmail.com> wrote:

>Still trying to get the array back up.
>
>Status: Clean, degraded with 9 out of 10 disks.
>One disk - removed as non-fresh.
>
>as a result two of LVs could not be mounted:
>
>mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg0-lv1,
>      missing codepage or helper program, or other error
>      In some cases useful info is found in syslog - try
>      dmesg | tail  or so
>
>mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg0-lv2,
>      missing codepage or helper program, or other error
>      In some cases useful info is found in syslog - try
>      dmesg | tail  or so
>
>[ 3357.006833] JBD: no valid journal superblock found
>[ 3357.006837] EXT4-fs (dm-1): error loading journal
>[ 3357.022603] JBD: no valid journal superblock found
>[ 3357.022606] EXT4-fs (dm-2): error loading journal
>
>
>
>Apparently there is a problem with re-adding non-fresh disk back to the array.
>
>#mdadm -a -v /dev/md0 /dev/sdf
>mdadm: /dev/sdf reports being an active member for /dev/md0, but a
>--re-add fails.
>mdadm: not performing --add as that would convert /dev/sdf in to a spare.
>mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdf" first.
>
>Question: Is there a way to resync the array using that non-fresh
>disk, as it may contain blocks needed by these LVs.
>At this stage I don't really want to add this disk as a spare.
>
>Any suggestions please?
>
>
>thanks
>
>On Tue, Sep 13, 2011 at 8:44 PM, Andriano <chief000@gmail.com> wrote:
>> Thanks everyone, looks like the problem is solved.
>>
>> For benefit of others who may experience same issue, here is what I've done:
>>
>> - upgraded firmware on ST32000542AS disks - from CC34 to CC35. It must
>> be done using onboard SATA in Native IDE (not RAID/AHCI) mode.
>> After reconnecting them back to HBA, size of one of the offenders fixed itself!
>>
>> - ran hdparm -N p3907029168 /dev/sdx command on other two disks and it
>> worked (probably it works straight after reboot)
>> Now mdadm -D shows the array as clean, degraded with one disk kicked
>> out, which is another story :)
>>
>> now need to resync array and restore two LVs which hasn't mounted :(
>>
>> On Tue, Sep 13, 2011 at 8:29 PM, Roman Mamedov <rm@romanrm.ru> wrote:
>>> On Tue, 13 Sep 2011 19:05:41 +1000
>>> Andriano <chief000@gmail.com> wrote:
>>>
>>>> Connected one of the offenders to HBA port, and hdparm outputs this:
>>>>
>>>> #hdparm -N /dev/sdh
>>>>
>>>> /dev/sdh:
>>>>  max sectors   = 3907027055/14715056(18446744073321613488?), HPA
>>>> setting seems invalid (buggy kernel device driver?)
>>>
>>> You could just try "hdparm -N p3907029168" (capacity of the 'larger' disks), but that could fail if the device driver is indeed buggy.
>>>
>>> Another possible course of action would be to try that on some other controller.
>>> For example on your motherboard you have two violet ports, http://www.gigabyte.ru/products/upload/products/1470/100a.jpg
>>> those are managed by the JMicron JMB363 controller, try plugging the disks which need HPA to be removed to those ports, AFAIR that JMicron controller works with "hdparm -N" just fine.
>>>
>>> --
>>> With respect,
>>> Roman
>>>
>>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-09-28  6:53 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-16 20:28 raid6 issues Chad Walker
2011-06-18 19:48 ` Chad Walker
2011-06-18 19:55   ` Chad Walker
2011-06-18 23:01     ` NeilBrown
2011-06-18 23:14       ` Chad Walker
  -- strict thread matches above, loose matches on Subject: below --
2011-09-13  6:14 RAID6 issues Andriano
2011-09-13  6:25 ` NeilBrown
2011-09-13  6:33   ` Andriano
2011-09-13  6:44     ` NeilBrown
2011-09-13  7:05       ` Andriano
2011-09-13  7:38         ` NeilBrown
2011-09-13  7:51           ` Andriano
2011-09-13  8:10             ` NeilBrown
2011-09-13  8:12             ` Alexander Kühn
2011-09-13  8:44             ` Roman Mamedov
2011-09-13  8:57               ` Andriano
2011-09-13  9:05                 ` Andriano
2011-09-13 10:29                   ` Roman Mamedov
2011-09-13 10:44                     ` Andriano
2011-09-13 13:45                       ` Andriano
2011-09-27 18:46 ` Thomas Fjellstrom
2011-09-27 19:14   ` Stan Hoeppner
2011-09-27 21:04     ` Thomas Fjellstrom
2011-09-28  2:47       ` Stan Hoeppner
2011-09-28  6:52         ` Thomas Fjellstrom
2011-09-28  6:03       ` Mikael Abrahamsson
2011-09-28  6:53         ` Thomas Fjellstrom
2011-09-13 14:24 NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).