linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robin Hill <robin@robinhill.me.uk>
To: "Niccolò Belli" <darkbasic@linuxsystems.it>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid1 issue after disk failure: both disks of the array are still active
Date: Sat, 15 Sep 2012 20:41:02 +0100	[thread overview]
Message-ID: <20120915194102.GA10403@cthulhu.home.robinhill.me.uk> (raw)
In-Reply-To: <5054D175.5070303@linuxsystems.it>

[-- Attachment #1: Type: text/plain, Size: 7309 bytes --]

On Sat Sep 15, 2012 at 09:05:25 +0200, Niccolò Belli wrote:

> CHECK didn't help me, so I did a echo "repair > 
> /sys/block/md0/md/sync_action". REPAIR didn't work too :(
> 
Didn't work for what you were wanting anyway. It may well have worked
for its intended purpose.

> Here is syslog of REPAIR:
> 
> Sep 15 19:34:10 asterisk mdadm[2117]: RebuildStarted event detected on 
> md device /dev/md/0
> Sep 15 19:34:10 asterisk kernel: [258470.152296] md: requested-resync of 
> RAID array md0
> Sep 15 19:34:10 asterisk kernel: [258470.152301] md: minimum 
> _guaranteed_  speed: 1000 KB/sec/disk.
> Sep 15 19:34:10 asterisk kernel: [258470.152304] md: using maximum 
> available idle IO bandwidth (but not more than 200000 KB/sec) for 
> requested-resync.
> Sep 15 19:34:10 asterisk kernel: [258470.152310] md: using 128k window, 
> over a total of 311619448k.
> Sep 15 19:34:11 asterisk kernel: [258471.165653] ata3.00: exception 
> Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Sep 15 19:34:11 asterisk kernel: [258471.167468] ata3.00: BMDMA stat 0x44
> Sep 15 19:34:11 asterisk kernel: [258471.169912] ata3.00: failed 
> command: READ DMA EXT
> Sep 15 19:34:11 asterisk kernel: [258471.172769] ata3.00: cmd 
> 25/00:00:00:15:00/00:04:00:00:00/e0 tag 0 dma 524288 in
> Sep 15 19:34:11 asterisk kernel: [258471.172771]          res 
> 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error)
> Sep 15 19:34:11 asterisk kernel: [258471.176753] ata3.00: status: { DRDY 
> ERR }
> Sep 15 19:34:11 asterisk kernel: [258471.178605] ata3.00: error: { UNC }
> Sep 15 19:34:12 asterisk kernel: [258472.148217] ata3.00: configured for 
> UDMA/133
> Sep 15 19:34:12 asterisk kernel: [258472.148232] ata3: EH complete
> Sep 15 19:34:13 asterisk kernel: [258473.131054] ata3.00: exception 
> Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Sep 15 19:34:13 asterisk kernel: [258473.132881] ata3.00: BMDMA stat 0x44
> Sep 15 19:34:13 asterisk kernel: [258473.134639] ata3.00: failed 
> command: READ DMA EXT
> Sep 15 19:34:13 asterisk kernel: [258473.136413] ata3.00: cmd 
> 25/00:00:00:15:00/00:04:00:00:00/e0 tag 0 dma 524288 in
> Sep 15 19:34:13 asterisk kernel: [258473.136415]          res 
> 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error)
> Sep 15 19:34:13 asterisk kernel: [258473.141768] ata3.00: status: { DRDY 
> ERR }
> Sep 15 19:34:13 asterisk kernel: [258473.144049] ata3.00: error: { UNC }
> Sep 15 19:34:14 asterisk kernel: [258474.112209] ata3.00: configured for 
> UDMA/133
> Sep 15 19:34:14 asterisk kernel: [258474.112224] ata3: EH complete
> Sep 15 19:34:15 asterisk kernel: [258475.071642] ata3.00: exception 
> Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Sep 15 19:34:15 asterisk kernel: [258475.073476] ata3.00: BMDMA stat 0x44
> Sep 15 19:34:15 asterisk kernel: [258475.075240] ata3.00: failed 
> command: READ DMA EXT
> Sep 15 19:34:15 asterisk kernel: [258475.077027] ata3.00: cmd 
> 25/00:00:00:15:00/00:04:00:00:00/e0 tag 0 dma 524288 in
> Sep 15 19:34:15 asterisk kernel: [258475.077029]          res 
> 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error)
> Sep 15 19:34:15 asterisk kernel: [258475.080720] ata3.00: status: { DRDY 
> ERR }
> Sep 15 19:34:15 asterisk kernel: [258475.083512] ata3.00: error: { UNC }
> Sep 15 19:34:16 asterisk kernel: [258476.100935] ata3.00: configured for 
> UDMA/133
> Sep 15 19:34:16 asterisk kernel: [258476.100960] ata3: EH complete
> Sep 15 19:41:29 asterisk asterisk[3492]: rc_avpair_new: unknown 
> attribute 1490026597
> Sep 15 19:41:46 asterisk asterisk[3492]: rc_avpair_new: unknown 
> attribute 1490026597
> Sep 15 19:41:52 asterisk asterisk[3492]: rc_avpair_new: unknown 
> attribute 1490026597
> Sep 15 19:42:52 asterisk asterisk[3492]: rc_avpair_new: unknown 
> attribute 1490026597
> Sep 15 19:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 2 
> Currently unreadable (pending) sectors
> Sep 15 19:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 1 Offline 
> uncorrectable sectors
> Sep 15 19:50:51 asterisk mdadm[2117]: Rebuild26 event detected on md 
> device /dev/md/0
> Sep 15 20:07:31 asterisk mdadm[2117]: Rebuild53 event detected on md 
> device /dev/md/0
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 2 
> Currently unreadable (pending) sectors
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 1 Offline 
> uncorrectable sectors
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 
> Temperature changed +4 Celsius to 42 Celsius (Min/Max 30/46)
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], SMART 
> Usage Attribute: 201 Soft_Read_Error_Rate changed from 99 to 100
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sdb [SAT], SMART 
> Usage Attribute: 190 Airflow_Temperature_Cel changed from 61 to 60
> Sep 15 20:24:11 asterisk mdadm[2117]: Rebuild75 event detected on md 
> device /dev/md/0
> Sep 15 20:40:51 asterisk mdadm[2117]: Rebuild93 event detected on md 
> device /dev/md/0
> Sep 15 20:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 2 
> Currently unreadable (pending) sectors
> Sep 15 20:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 1 Offline 
> uncorrectable sectors
> Sep 15 20:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], SMART 
> Usage Attribute: 190 Airflow_Temperature_Cel changed from 61 to 60
> Sep 15 20:47:24 asterisk kernel: [262863.781068] md: md0: 
> requested-resync done.
> Sep 15 20:47:24 asterisk mdadm[2117]: RebuildFinished event detected on 
> md device /dev/md/0
> 
> 
Okay, so the drive logs an exception at 19:34:11, then completes its
error handling at 19:34:16.

If md hasn't failed the drive then either:
  - md didn't get a read error
  - md got a success message when re-writing the block
  - there's a bug in md and it's not handled the error at all

My guess would be on one of the first two (I'm not sure what's logged if
md gets a read error and does a re-write).

> 
> I still get:
> 
> Num  Test_Description    Status                  Remaining 
> LifeTime(hours)  LBA_of_first_error
> # 1  Offline             Completed: read failure       90%      8985 
>       3912
> 
> and
> 
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always 
>        -       2
> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age 
> Offline      -       1
> 
> 
> How is it possible? Next thing I will try is manually failing /dev/sda 
> and filling it with zeros. I would like to do a *low level format* but I 
> didn't find the utility for my disk :(
> 
I'm pretty sure there's no such thing as a *low level format* for any
modern disk (or not one that does anything more than writing a known
pattern to the disk). The low-level information is far too precisely
laid out for the disk heads to be able to write.

Writing zeros is certainly what I'd do in this situation - I've done it
for several drives in the past where they've had offline uncorrectable
sectors flagged.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2012-09-15 19:41 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-13 10:01 raid1 issue after disk failure: both disks of the array are still active Niccolò Belli
2012-09-13 10:34 ` Robin Hill
2012-09-13 10:46   ` Niccolò Belli
     [not found]     ` <5051BBC3.4050805@websitemanagers.com.au>
2012-09-13 11:29       ` Niccolò Belli
     [not found]     ` <CABYL=TpKD2B0vwTrHH=iFK3PcMWueEsi84ACRbBQkDXuiWG3kw@mail.gmail.com>
2012-09-13 15:32       ` Roberto Spadim
2012-09-13 15:48         ` Niccolò Belli
2012-09-13 15:53           ` Roberto Spadim
2012-09-14  7:54             ` Niccolò Belli
2012-09-13 17:02   ` Chris Murphy
2012-09-13 17:39     ` Roberto Spadim
2012-09-13 20:13       ` Chris Murphy
2012-09-14  7:16     ` Mikael Abrahamsson
2012-09-14  7:45       ` Niccolò Belli
2012-09-14 18:04         ` Chris Murphy
2012-09-14 18:27           ` Robin Hill
2012-09-14 18:53             ` Chris Murphy
2012-09-15 19:05               ` Niccolò Belli
2012-09-15 19:41                 ` Robin Hill [this message]
2012-09-15 22:06                   ` Niccolò Belli
2012-09-16 10:18                     ` Robin Hill
2012-09-16 10:42                   ` Niccolò Belli
2012-09-16 15:26                     ` Chris Murphy
2012-09-16 15:31                       ` Niccolò Belli
2012-09-16 23:35                         ` Niccolò Belli
2012-09-17  0:00                           ` Chris Murphy
2012-09-17  0:03                             ` Niccolò Belli
2012-09-14  8:13       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120915194102.GA10403@cthulhu.home.robinhill.me.uk \
    --to=robin@robinhill.me.uk \
    --cc=darkbasic@linuxsystems.it \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).