mdadm RAID6 faulty drive

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm RAID6 faulty drive
@ 2013-03-25 16:02 Paramasivam, Meenakshisundaram
  2013-03-25 17:43 ` Phil Turmel
  0 siblings, 1 reply; 4+ messages in thread
From: Paramasivam, Meenakshisundaram @ 2013-03-25 16:02 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org


Hi,

As a result of extended power outage, the FedoraCore 17 machine with mdadm RAID went down. Bringing it up, I noticed "faulty /dev/sdf" in mdadm -detail. However mdadm -E /dev/sdf shows "State : clean".  Details are shown below. When I tried to add the drive to array, resync fails (I see lots of eSATA bus resets), and I get the same message in mdadm -detail. 

Questions:
1. How can a clean drive be reported faulty?
2. Is there a easy way to mark drive (/dev/sdf) as "assume-clean" and add it?

Please let me know if I should get an exact  replacement drive at this stage, pull out faulty /dev/sdf, and add the new drive to array. Thanks. 

Details:
#mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Thu Dec 20 13:08:56 2012
     Raid Level : raid6
     Array Size : 11720297472 (11177.35 GiB 12001.58 GB)
  Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
   Raid Devices : 8
  Total Devices : 8
    Persistence : Superblock is persistent

    Update Time : Mon Mar 25 11:37:12 2013
          State : clean, degraded 
 Active Devices : 7
Working Devices : 7
Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : RAIDvol1
           UUID : 8a9eee70:89f2639b:68f5350d:11f444fe
         Events : 1494

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       96        1      active sync   /dev/sdg
       2       8      112        2      active sync   /dev/sdh
       3       8      128        3      active sync   /dev/sdi
       8       8       16        4      active sync   /dev/sdb
       5       8       32        5      active sync   /dev/sdc
       6       8       48        6      active sync   /dev/sdd
       7       8       64        7      active sync   /dev/sde

       9       8       80        -      faulty   /dev/sdf

# mdadm -E /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x2
     Array UUID : 8a9eee70:89f2639b:68f5350d:11f444fe
           Name : RAIDvol1
  Creation Time : Thu Dec 20 13:08:56 2012
     Raid Level : raid6
   Raid Devices : 8

Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 11720297472 (11177.35 GiB 12001.58 GB)
  Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
Recovery Offset : 0 sectors
          State : clean
    Device UUID : 0a58fa5c:4d10f401:07dead3a:ec844676

    Update Time : Fri Mar 22 14:25:13 2013
       Checksum : ba455125 - correct
         Events : 1043

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAAAA ('A' == active, '.' == missing)
#

Sundar

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mdadm RAID6 faulty drive
  2013-03-25 16:02 mdadm RAID6 faulty drive Paramasivam, Meenakshisundaram
@ 2013-03-25 17:43 ` Phil Turmel
  2013-03-26 20:05   ` Roy Sigurd Karlsbakk
  2013-03-27 20:46   ` Paramasivam, Meenakshisundaram
  0 siblings, 2 replies; 4+ messages in thread
From: Phil Turmel @ 2013-03-25 17:43 UTC (permalink / raw)
  To: Paramasivam, Meenakshisundaram; +Cc: linux-raid@vger.kernel.org

On 03/25/2013 12:02 PM, Paramasivam, Meenakshisundaram wrote:
> 
> Hi,
> 
> As a result of extended power outage, the FedoraCore 17 machine with
> mdadm RAID went down. Bringing it up, I noticed "faulty /dev/sdf" in
> mdadm -detail. However mdadm -E /dev/sdf shows "State : clean".
> Details are shown below. When I tried to add the drive to array,
> resync fails (I see lots of eSATA bus resets), and I get the same
> message in mdadm -detail.
> 
> Questions:
> 1. How can a clean drive be reported faulty?

When the drive is kicked out for I/O errors its superblock is left as-is
(just as if you pulled its sata cable).  The remaining devices'
superblocks are marked to show the failed drive, and *their*
superblocks' event count is bumped.  The failed status of that device is
derived during assembly when its superblock is found to be stale.

> 2. Is there a easy way to mark drive (/dev/sdf) as "assume-clean" and
> add it?

No.  The closest thing is to use a write-intent bitmap and "re-add"
devices that are disconnected.

That's not your problem.

> Please let me know if I should get an exact  replacement drive at
> this stage, pull out faulty /dev/sdf, and add the new drive to array.
> Thanks.

You very likely need a new drive.  You might want to try plugging that
drive into a different controller, or a different port on the same
controller, just to narrow the diagnosis.

You could also show us some of the kernel error messages, or show the
output of "smartctl -x /dev/sdf".

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mdadm RAID6 faulty drive
  2013-03-25 17:43 ` Phil Turmel
@ 2013-03-26 20:05   ` Roy Sigurd Karlsbakk
  2013-03-27 20:46   ` Paramasivam, Meenakshisundaram
  1 sibling, 0 replies; 4+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-03-26 20:05 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid, Meenakshisundaram Paramasivam

> You could also show us some of the kernel error messages, or show the
> output of "smartctl -x /dev/sdf".

I second that. Also, if the command above shows no errors, do tests with smartctl -t short and -t long. If those succeed, you may try to re-add it, but I wouldn't --assume-clean. Better go through a normal rebuild to see if any errors occur.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: mdadm RAID6 faulty drive
  2013-03-25 17:43 ` Phil Turmel
  2013-03-26 20:05   ` Roy Sigurd Karlsbakk
@ 2013-03-27 20:46   ` Paramasivam, Meenakshisundaram
  1 sibling, 0 replies; 4+ messages in thread
From: Paramasivam, Meenakshisundaram @ 2013-03-27 20:46 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid@vger.kernel.org

Thanks for the smartctl. Though smartctl -t short /dev/sdf passed the test, I was still unable to add the drive.

Then I did this:
#mdadm /dev/md2 --remove /dev/sdf
#mdadm --stop /dev/md2  (after turning off all processes
#dd if=/dev/zero of=/dev/sdf
dd: writing to `/dev/sdf': Input/output error
8522041+0 records in
8522040+0 records out
4363284480 bytes (4.4 GB) copied, 138.782 s, 31.4 MB/s

Appears to be a bad drive, I will replace it. 

Sundar

________________________________________
From: Phil Turmel [philip@turmel.org]
Sent: Monday, March 25, 2013 1:43 PM
To: Paramasivam, Meenakshisundaram
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm RAID6 faulty drive

On 03/25/2013 12:02 PM, Paramasivam, Meenakshisundaram wrote:
>
> Hi,
>
> As a result of extended power outage, the FedoraCore 17 machine with
> mdadm RAID went down. Bringing it up, I noticed "faulty /dev/sdf" in
> mdadm -detail. However mdadm -E /dev/sdf shows "State : clean".
> Details are shown below. When I tried to add the drive to array,
> resync fails (I see lots of eSATA bus resets), and I get the same
> message in mdadm -detail.
>
> Questions:
> 1. How can a clean drive be reported faulty?

When the drive is kicked out for I/O errors its superblock is left as-is
(just as if you pulled its sata cable).  The remaining devices'
superblocks are marked to show the failed drive, and *their*
superblocks' event count is bumped.  The failed status of that device is
derived during assembly when its superblock is found to be stale.

> 2. Is there a easy way to mark drive (/dev/sdf) as "assume-clean" and
> add it?

No.  The closest thing is to use a write-intent bitmap and "re-add"
devices that are disconnected.

That's not your problem.

> Please let me know if I should get an exact  replacement drive at
> this stage, pull out faulty /dev/sdf, and add the new drive to array.
> Thanks.

You very likely need a new drive.  You might want to try plugging that
drive into a different controller, or a different port on the same
controller, just to narrow the diagnosis.

You could also show us some of the kernel error messages, or show the
output of "smartctl -x /dev/sdf".

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-27 20:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-25 16:02 mdadm RAID6 faulty drive Paramasivam, Meenakshisundaram
2013-03-25 17:43 ` Phil Turmel
2013-03-26 20:05   ` Roy Sigurd Karlsbakk
2013-03-27 20:46   ` Paramasivam, Meenakshisundaram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).