degraded raid troubleshooting

All of lore.kernel.org
 help / color / mirror / Atom feed

* degraded raid troubleshooting
@ 2014-11-20 13:41 Stephen Burke
  2014-11-20 23:16 ` Phil Turmel
  0 siblings, 1 reply; 2+ messages in thread
From: Stephen Burke @ 2014-11-20 13:41 UTC (permalink / raw)
  To: linux-raid

I woke up this morning to my pc not booting saying that my raid was in
a degraded state.  I looked at the raid wiki and it told me to stop
what I was doing and mail the linux-raid list before doing anything
hasty.

Here's all the info that I could find out about it.  Any help would be
appreciated.
I am running Ubuntu 12.04
mdadm - v3.2.5 - 18th May 2012

The drive in question is /dev/sdb1 on my system.  I tried to look at
it via fdisk but it hangs up.  What should my first steps to figure
out if this drive is bad and if so replace it.  Thanks.


sburke@ht-pc:/tmp/logs$ sudo mdadm --detail /dev/md0

[sudo] password for sburke:

/dev/md0:

        Version : 1.2

  Creation Time : Fri Dec 13 01:18:13 2013

     Raid Level : raid5

     Array Size : 3906763776 (3725.78 GiB 4000.53 GB)

  Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)

   Raid Devices : 3

  Total Devices : 2

    Persistence : Superblock is persistent


    Update Time : Thu Nov 20 01:16:00 2014

          State : clean, degraded

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0


         Layout : left-symmetric

     Chunk Size : 512K


           Name : ht-pc:0  (local to host ht-pc)

           UUID : 508cb42f:d2c1ea9c:e62b4121:c3d9cbc3

         Events : 140


    Number   Major   Minor   RaidDevice State

       0       0        0        0      removed

       1       8       33        1      active sync   /dev/sdc1

       3       8       65        2      active sync   /dev/sde1

sburke@ht-pc:/tmp/logs$ cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]

md0 : active raid5 sde1[3] sdc1[1]

      3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]



unused devices: <none>

syslog

Nov 20 01:14:53 ht-pc kernel: [    2.465076]          res
41/40:08:09:08:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>

Nov 20 01:14:53 ht-pc kernel: [    2.465078] ata2.00: status: { DRDY ERR }

Nov 20 01:14:53 ht-pc kernel: [    2.465079] ata2.00: error: { UNC }

Nov 20 01:14:53 ht-pc kernel: [    2.484536] ata2.00: configured for UDMA/133

Nov 20 01:14:53 ht-pc kernel: [    2.484543] ata2: EH complete

Nov 20 01:14:53 ht-pc kernel: [    3.131754] ata2.00: exception Emask
0x0 SAct 0x40 SErr 0x0 action 0x0

Nov 20 01:14:53 ht-pc kernel: [    3.131756] ata2.00: irq_stat 0x40000008

Nov 20 01:14:53 ht-pc kernel: [    3.131758] ata2.00: failed command:
READ FPDMA QUEUED

Nov 20 01:14:53 ht-pc kernel: [    3.131762] ata2.00: cmd
60/08:30:08:08:00/00:00:00:00:00/40 tag 6 ncq 4096 in

Nov 20 01:14:53 ht-pc kernel: [    3.131763]          res
41/40:08:09:08:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>

-- 
Steve
www.stayathomedevs.com

Game Data Editor Unity Plugin

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: degraded raid troubleshooting
  2014-11-20 13:41 degraded raid troubleshooting Stephen Burke
@ 2014-11-20 23:16 ` Phil Turmel
  0 siblings, 0 replies; 2+ messages in thread
From: Phil Turmel @ 2014-11-20 23:16 UTC (permalink / raw)
  To: Stephen Burke, linux-raid

Hi Stephen,

On 11/20/2014 08:41 AM, Stephen Burke wrote:
> I woke up this morning to my pc not booting saying that my raid was in
> a degraded state.  I looked at the raid wiki and it told me to stop
> what I was doing and mail the linux-raid list before doing anything
> hasty.

:-)

> Here's all the info that I could find out about it.  Any help would be
> appreciated.
> I am running Ubuntu 12.04
> mdadm - v3.2.5 - 18th May 2012
> 
> The drive in question is /dev/sdb1 on my system.  I tried to look at
> it via fdisk but it hangs up.  What should my first steps to figure
> out if this drive is bad and if so replace it.  Thanks.

Good news: your data is still safe, and already assembled (ready to
use).  The boot failure is a one-time warning that the number of drives
available at shutdown didn't match the available drives at bootup.

> syslog
> 
> Nov 20 01:14:53 ht-pc kernel: [    2.465076]          res
> 41/40:08:09:08:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>
> 
> Nov 20 01:14:53 ht-pc kernel: [    2.465078] ata2.00: status: { DRDY ERR }
> 
> Nov 20 01:14:53 ht-pc kernel: [    2.465079] ata2.00: error: { UNC }
> 
> Nov 20 01:14:53 ht-pc kernel: [    2.484536] ata2.00: configured for UDMA/133
> 
> Nov 20 01:14:53 ht-pc kernel: [    2.484543] ata2: EH complete
> 
> Nov 20 01:14:53 ht-pc kernel: [    3.131754] ata2.00: exception Emask
> 0x0 SAct 0x40 SErr 0x0 action 0x0
> 
> Nov 20 01:14:53 ht-pc kernel: [    3.131756] ata2.00: irq_stat 0x40000008
> 
> Nov 20 01:14:53 ht-pc kernel: [    3.131758] ata2.00: failed command:
> READ FPDMA QUEUED
> 
> Nov 20 01:14:53 ht-pc kernel: [    3.131762] ata2.00: cmd
> 60/08:30:08:08:00/00:00:00:00:00/40 tag 6 ncq 4096 in
> 
> Nov 20 01:14:53 ht-pc kernel: [    3.131763]          res
> 41/40:08:09:08:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>

Bad news: that drive is very likely dead.  It didn't communicate at all.

If you replace the drive and the replacement works, I would count that
as definitively a bad drive.  But it could be a cable or controller
problem.  Such things happen.

Before adding the new drive, though, I would show the "mdadm -E" reports
for each of the surviving member devices.  Just in case you encounter a
problem during rebuild (ridiculously common for big drives in raid5).

Anyways, use "mdadm /dev/md0 --add /dev/sdX1" after you partition the
new drive.  That'll start the rebuild.

Phil



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-11-20 23:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-20 13:41 degraded raid troubleshooting Stephen Burke
2014-11-20 23:16 ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.