From: Robin Hill <robin@robinhill.me.uk>
To: "Niccolò Belli" <darkbasic@linuxsystems.it>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid1 issue after disk failure: both disks of the array are still active
Date: Sat, 15 Sep 2012 20:41:02 +0100 [thread overview]
Message-ID: <20120915194102.GA10403@cthulhu.home.robinhill.me.uk> (raw)
In-Reply-To: <5054D175.5070303@linuxsystems.it>
[-- Attachment #1: Type: text/plain, Size: 7309 bytes --]
On Sat Sep 15, 2012 at 09:05:25 +0200, Niccolò Belli wrote:
> CHECK didn't help me, so I did a echo "repair >
> /sys/block/md0/md/sync_action". REPAIR didn't work too :(
>
Didn't work for what you were wanting anyway. It may well have worked
for its intended purpose.
> Here is syslog of REPAIR:
>
> Sep 15 19:34:10 asterisk mdadm[2117]: RebuildStarted event detected on
> md device /dev/md/0
> Sep 15 19:34:10 asterisk kernel: [258470.152296] md: requested-resync of
> RAID array md0
> Sep 15 19:34:10 asterisk kernel: [258470.152301] md: minimum
> _guaranteed_ speed: 1000 KB/sec/disk.
> Sep 15 19:34:10 asterisk kernel: [258470.152304] md: using maximum
> available idle IO bandwidth (but not more than 200000 KB/sec) for
> requested-resync.
> Sep 15 19:34:10 asterisk kernel: [258470.152310] md: using 128k window,
> over a total of 311619448k.
> Sep 15 19:34:11 asterisk kernel: [258471.165653] ata3.00: exception
> Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Sep 15 19:34:11 asterisk kernel: [258471.167468] ata3.00: BMDMA stat 0x44
> Sep 15 19:34:11 asterisk kernel: [258471.169912] ata3.00: failed
> command: READ DMA EXT
> Sep 15 19:34:11 asterisk kernel: [258471.172769] ata3.00: cmd
> 25/00:00:00:15:00/00:04:00:00:00/e0 tag 0 dma 524288 in
> Sep 15 19:34:11 asterisk kernel: [258471.172771] res
> 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error)
> Sep 15 19:34:11 asterisk kernel: [258471.176753] ata3.00: status: { DRDY
> ERR }
> Sep 15 19:34:11 asterisk kernel: [258471.178605] ata3.00: error: { UNC }
> Sep 15 19:34:12 asterisk kernel: [258472.148217] ata3.00: configured for
> UDMA/133
> Sep 15 19:34:12 asterisk kernel: [258472.148232] ata3: EH complete
> Sep 15 19:34:13 asterisk kernel: [258473.131054] ata3.00: exception
> Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Sep 15 19:34:13 asterisk kernel: [258473.132881] ata3.00: BMDMA stat 0x44
> Sep 15 19:34:13 asterisk kernel: [258473.134639] ata3.00: failed
> command: READ DMA EXT
> Sep 15 19:34:13 asterisk kernel: [258473.136413] ata3.00: cmd
> 25/00:00:00:15:00/00:04:00:00:00/e0 tag 0 dma 524288 in
> Sep 15 19:34:13 asterisk kernel: [258473.136415] res
> 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error)
> Sep 15 19:34:13 asterisk kernel: [258473.141768] ata3.00: status: { DRDY
> ERR }
> Sep 15 19:34:13 asterisk kernel: [258473.144049] ata3.00: error: { UNC }
> Sep 15 19:34:14 asterisk kernel: [258474.112209] ata3.00: configured for
> UDMA/133
> Sep 15 19:34:14 asterisk kernel: [258474.112224] ata3: EH complete
> Sep 15 19:34:15 asterisk kernel: [258475.071642] ata3.00: exception
> Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Sep 15 19:34:15 asterisk kernel: [258475.073476] ata3.00: BMDMA stat 0x44
> Sep 15 19:34:15 asterisk kernel: [258475.075240] ata3.00: failed
> command: READ DMA EXT
> Sep 15 19:34:15 asterisk kernel: [258475.077027] ata3.00: cmd
> 25/00:00:00:15:00/00:04:00:00:00/e0 tag 0 dma 524288 in
> Sep 15 19:34:15 asterisk kernel: [258475.077029] res
> 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error)
> Sep 15 19:34:15 asterisk kernel: [258475.080720] ata3.00: status: { DRDY
> ERR }
> Sep 15 19:34:15 asterisk kernel: [258475.083512] ata3.00: error: { UNC }
> Sep 15 19:34:16 asterisk kernel: [258476.100935] ata3.00: configured for
> UDMA/133
> Sep 15 19:34:16 asterisk kernel: [258476.100960] ata3: EH complete
> Sep 15 19:41:29 asterisk asterisk[3492]: rc_avpair_new: unknown
> attribute 1490026597
> Sep 15 19:41:46 asterisk asterisk[3492]: rc_avpair_new: unknown
> attribute 1490026597
> Sep 15 19:41:52 asterisk asterisk[3492]: rc_avpair_new: unknown
> attribute 1490026597
> Sep 15 19:42:52 asterisk asterisk[3492]: rc_avpair_new: unknown
> attribute 1490026597
> Sep 15 19:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 2
> Currently unreadable (pending) sectors
> Sep 15 19:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 1 Offline
> uncorrectable sectors
> Sep 15 19:50:51 asterisk mdadm[2117]: Rebuild26 event detected on md
> device /dev/md/0
> Sep 15 20:07:31 asterisk mdadm[2117]: Rebuild53 event detected on md
> device /dev/md/0
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 2
> Currently unreadable (pending) sectors
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 1 Offline
> uncorrectable sectors
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT],
> Temperature changed +4 Celsius to 42 Celsius (Min/Max 30/46)
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sda [SAT], SMART
> Usage Attribute: 201 Soft_Read_Error_Rate changed from 99 to 100
> Sep 15 20:16:34 asterisk smartd[2581]: Device: /dev/sdb [SAT], SMART
> Usage Attribute: 190 Airflow_Temperature_Cel changed from 61 to 60
> Sep 15 20:24:11 asterisk mdadm[2117]: Rebuild75 event detected on md
> device /dev/md/0
> Sep 15 20:40:51 asterisk mdadm[2117]: Rebuild93 event detected on md
> device /dev/md/0
> Sep 15 20:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 2
> Currently unreadable (pending) sectors
> Sep 15 20:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], 1 Offline
> uncorrectable sectors
> Sep 15 20:46:34 asterisk smartd[2581]: Device: /dev/sda [SAT], SMART
> Usage Attribute: 190 Airflow_Temperature_Cel changed from 61 to 60
> Sep 15 20:47:24 asterisk kernel: [262863.781068] md: md0:
> requested-resync done.
> Sep 15 20:47:24 asterisk mdadm[2117]: RebuildFinished event detected on
> md device /dev/md/0
>
>
Okay, so the drive logs an exception at 19:34:11, then completes its
error handling at 19:34:16.
If md hasn't failed the drive then either:
- md didn't get a read error
- md got a success message when re-writing the block
- there's a bug in md and it's not handled the error at all
My guess would be on one of the first two (I'm not sure what's logged if
md gets a read error and does a re-write).
>
> I still get:
>
> Num Test_Description Status Remaining
> LifeTime(hours) LBA_of_first_error
> # 1 Offline Completed: read failure 90% 8985
> 3912
>
> and
>
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 2
> 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age
> Offline - 1
>
>
> How is it possible? Next thing I will try is manually failing /dev/sda
> and filling it with zeros. I would like to do a *low level format* but I
> didn't find the utility for my disk :(
>
I'm pretty sure there's no such thing as a *low level format* for any
modern disk (or not one that does anything more than writing a known
pattern to the disk). The low-level information is far too precisely
laid out for the disk heads to be able to write.
Writing zeros is certainly what I'd do in this situation - I've done it
for several drives in the past where they've had offline uncorrectable
sectors flagged.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2012-09-15 19:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-13 10:01 raid1 issue after disk failure: both disks of the array are still active Niccolò Belli
2012-09-13 10:34 ` Robin Hill
2012-09-13 10:46 ` Niccolò Belli
[not found] ` <5051BBC3.4050805@websitemanagers.com.au>
2012-09-13 11:29 ` Niccolò Belli
[not found] ` <CABYL=TpKD2B0vwTrHH=iFK3PcMWueEsi84ACRbBQkDXuiWG3kw@mail.gmail.com>
2012-09-13 15:32 ` Roberto Spadim
2012-09-13 15:48 ` Niccolò Belli
2012-09-13 15:53 ` Roberto Spadim
2012-09-14 7:54 ` Niccolò Belli
2012-09-13 17:02 ` Chris Murphy
2012-09-13 17:39 ` Roberto Spadim
2012-09-13 20:13 ` Chris Murphy
2012-09-14 7:16 ` Mikael Abrahamsson
2012-09-14 7:45 ` Niccolò Belli
2012-09-14 18:04 ` Chris Murphy
2012-09-14 18:27 ` Robin Hill
2012-09-14 18:53 ` Chris Murphy
2012-09-15 19:05 ` Niccolò Belli
2012-09-15 19:41 ` Robin Hill [this message]
2012-09-15 22:06 ` Niccolò Belli
2012-09-16 10:18 ` Robin Hill
2012-09-16 10:42 ` Niccolò Belli
2012-09-16 15:26 ` Chris Murphy
2012-09-16 15:31 ` Niccolò Belli
2012-09-16 23:35 ` Niccolò Belli
2012-09-17 0:00 ` Chris Murphy
2012-09-17 0:03 ` Niccolò Belli
2012-09-14 8:13 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120915194102.GA10403@cthulhu.home.robinhill.me.uk \
--to=robin@robinhill.me.uk \
--cc=darkbasic@linuxsystems.it \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).