* md resync ignoring unreadable sectors
@ 2015-02-07 21:47 Roman Mamedov
2015-02-07 22:39 ` Eyal Lebedinsky
0 siblings, 1 reply; 8+ messages in thread
From: Roman Mamedov @ 2015-02-07 21:47 UTC (permalink / raw)
To: linux-raid
Hello,
I've got some bad sectors on one drive:
dd: reading `/dev/sdh1': Input/output error
260200+0 records in
260200+0 records out
133222400 bytes (133 MB) copied, 2.97188 s, 44.8 MB/s
[ 3908.350331] ata9.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
[ 3908.350385] ata9.00: irq_stat 0x40000008
[ 3908.350427] ata9.00: failed command: READ FPDMA QUEUED
[ 3908.350474] ata9.00: cmd 60/06:90:6a:00:04/00:00:00:00:00/40 tag 18 ncq 3072 in
[ 3908.350474] res 51/40:06:6a:00:04/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[ 3908.350628] ata9.00: status: { DRDY ERR }
[ 3908.350669] ata9.00: error: { UNC }
[ 3908.354643] ata9.00: configured for UDMA/133
[ 3908.354664] sd 8:0:0:0: [sdh] Unhandled sense code
[ 3908.354668] sd 8:0:0:0: [sdh]
[ 3908.354671] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3908.354674] sd 8:0:0:0: [sdh]
[ 3908.354677] Sense Key : Medium Error [current] [descriptor]
[ 3908.354681] Descriptor sense data with sense descriptors (in hex):
[ 3908.354683] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 3908.354695] 00 04 00 6a
[ 3908.354701] sd 8:0:0:0: [sdh]
[ 3908.354705] Add. Sense: Unrecovered read error - auto reallocate failed
[ 3908.354708] sd 8:0:0:0: [sdh] CDB:
[ 3908.354710] Read(10): 28 00 00 04 00 6a 00 00 06 00
[ 3908.354721] end_request: I/O error, dev sdh, sector 262250
[ 3908.354773] Buffer I/O error on device sdh1, logical block 260202
[ 3908.354825] Buffer I/O error on device sdh1, logical block 260203
[ 3908.354891] Buffer I/O error on device sdh1, logical block 260204
[ 3908.354942] Buffer I/O error on device sdh1, logical block 260205
[ 3908.354992] Buffer I/O error on device sdh1, logical block 260206
[ 3908.355042] Buffer I/O error on device sdh1, logical block 260207
[ 3908.355125] ata9: EH complete
Generally I believe these should go away when overwritten, but how do I
overwrite them? The drive is an md RAID1 member:
/dev/md4:
Version : 1.2
Creation Time : Mon May 26 13:40:18 2014
Raid Level : raid1
Array Size : 1953379936 (1862.89 GiB 2000.26 GB)
Used Dev Size : 1953379936 (1862.89 GiB 2000.26 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Feb 8 02:39:58 2015
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : natsu.romanrm.net:4 (local to host natsu.romanrm.net)
UUID : 3b8c3166:073249b5:e1384bd6:4611df90
Events : 50426
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
1 8 113 1 active sync /dev/sdh1
I thought I would run a 'check' or 'repair', this will read from both drives,
fail to read from sdh, then try to overwrite the affected areas on sdh. But
nope:
# echo 0 > /sys/block/md4/md/sync_min
# echo check > /sys/block/md4/md/sync_action
[ 4059.451036] md: data-check of RAID array md4
[ 4059.451040] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 4059.451042] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[ 4059.451046] md: using 128k window, over a total of 1953379936k.
This happily proceeds through the supposedly unreadable area:
md4 : active raid1 sdd1[0] sdh1[1]
1953379936 blocks super 1.2 [2/2] [UU]
[>....................] check = 0.0% (1479680/1953379936) finish=1116.8min speed=29128K/sec
bitmap: 2/8 pages [8KB], 131072KB chunk
at 1.5GB already, while the unreadable sectors are at ~133MB. And no new ATA
errors in dmesg. How is this possible?
If I retry the 'dd' command right now, it fails exactly in the same way as
before (and ATA errors do indeed appear).
--
With respect,
Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 21:47 md resync ignoring unreadable sectors Roman Mamedov
@ 2015-02-07 22:39 ` Eyal Lebedinsky
2015-02-07 23:04 ` Roman Mamedov
0 siblings, 1 reply; 8+ messages in thread
From: Eyal Lebedinsky @ 2015-02-07 22:39 UTC (permalink / raw)
To: linux-raid
On 08/02/15 08:47, Roman Mamedov wrote:
> Hello,
>
> I've got some bad sectors on one drive:
>
> dd: reading `/dev/sdh1': Input/output error
> 260200+0 records in
> 260200+0 records out
> 133222400 bytes (133 MB) copied, 2.97188 s, 44.8 MB/s
>
> [ 3908.350331] ata9.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
> [ 3908.350385] ata9.00: irq_stat 0x40000008
> [ 3908.350427] ata9.00: failed command: READ FPDMA QUEUED
> [ 3908.350474] ata9.00: cmd 60/06:90:6a:00:04/00:00:00:00:00/40 tag 18 ncq 3072 in
> [ 3908.350474] res 51/40:06:6a:00:04/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> [ 3908.350628] ata9.00: status: { DRDY ERR }
> [ 3908.350669] ata9.00: error: { UNC }
> [ 3908.354643] ata9.00: configured for UDMA/133
> [ 3908.354664] sd 8:0:0:0: [sdh] Unhandled sense code
> [ 3908.354668] sd 8:0:0:0: [sdh]
> [ 3908.354671] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3908.354674] sd 8:0:0:0: [sdh]
> [ 3908.354677] Sense Key : Medium Error [current] [descriptor]
> [ 3908.354681] Descriptor sense data with sense descriptors (in hex):
> [ 3908.354683] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [ 3908.354695] 00 04 00 6a
> [ 3908.354701] sd 8:0:0:0: [sdh]
> [ 3908.354705] Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3908.354708] sd 8:0:0:0: [sdh] CDB:
> [ 3908.354710] Read(10): 28 00 00 04 00 6a 00 00 06 00
> [ 3908.354721] end_request: I/O error, dev sdh, sector 262250
> [ 3908.354773] Buffer I/O error on device sdh1, logical block 260202
> [ 3908.354825] Buffer I/O error on device sdh1, logical block 260203
> [ 3908.354891] Buffer I/O error on device sdh1, logical block 260204
> [ 3908.354942] Buffer I/O error on device sdh1, logical block 260205
> [ 3908.354992] Buffer I/O error on device sdh1, logical block 260206
> [ 3908.355042] Buffer I/O error on device sdh1, logical block 260207
> [ 3908.355125] ata9: EH complete
>
> Generally I believe these should go away when overwritten, but how do I
> overwrite them? The drive is an md RAID1 member:
>
> /dev/md4:
> Version : 1.2
> Creation Time : Mon May 26 13:40:18 2014
> Raid Level : raid1
> Array Size : 1953379936 (1862.89 GiB 2000.26 GB)
> Used Dev Size : 1953379936 (1862.89 GiB 2000.26 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Sun Feb 8 02:39:58 2015
> State : active
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Name : natsu.romanrm.net:4 (local to host natsu.romanrm.net)
> UUID : 3b8c3166:073249b5:e1384bd6:4611df90
> Events : 50426
>
> Number Major Minor RaidDevice State
> 0 8 49 0 active sync /dev/sdd1
> 1 8 113 1 active sync /dev/sdh1
>
> I thought I would run a 'check' or 'repair', this will read from both drives,
> fail to read from sdh, then try to overwrite the affected areas on sdh. But
> nope:
>
> # echo 0 > /sys/block/md4/md/sync_min
> # echo check > /sys/block/md4/md/sync_action
>
> [ 4059.451036] md: data-check of RAID array md4
> [ 4059.451040] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> [ 4059.451042] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> [ 4059.451046] md: using 128k window, over a total of 1953379936k.
>
> This happily proceeds through the supposedly unreadable area:
>
> md4 : active raid1 sdd1[0] sdh1[1]
> 1953379936 blocks super 1.2 [2/2] [UU]
> [>....................] check = 0.0% (1479680/1953379936) finish=1116.8min speed=29128K/sec
> bitmap: 2/8 pages [8KB], 131072KB chunk
>
> at 1.5GB already, while the unreadable sectors are at ~133MB. And no new ATA
> errors in dmesg. How is this possible?
>
> If I retry the 'dd' command right now, it fails exactly in the same way as
> before (and ATA errors do indeed appear).
Hi,
I had a similar situation. In my case the bad sectors fell in an unused control area, part of the header,
which is not read (or written) by the md normally or by the sync.
The error did not show up during normal operation (or during scrub), only during the smartctl long test.
What triggered the error for you?
I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
around the bad area also showed it to be all zeroes.
I ended up directly zeroing the bad sectors (hdparm --repair-sector ...).
YMMV
--
Eyal Lebedinsky (eyal@eyal.emu.id.au)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 22:39 ` Eyal Lebedinsky
@ 2015-02-07 23:04 ` Roman Mamedov
2015-02-07 23:42 ` Phil Turmel
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Roman Mamedov @ 2015-02-07 23:04 UTC (permalink / raw)
To: Eyal Lebedinsky; +Cc: linux-raid
On Sun, 08 Feb 2015 09:39:47 +1100
Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
> The error did not show up during normal operation (or during scrub), only during the smartctl long test.
> What triggered the error for you?
Just appeared during boot-up after a reboot (after 50 days uptime) which was
performed for some hardware upgrades (RAM, SATA controller). The error doesn't
go away after swapping the SATA controller for different one.
> I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
> around the bad area also showed it to be all zeroes.
I wouldn't expect mdadm to have any headers or unused areas as far as 133 MB
into a RAID member.
--
With respect,
Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 23:04 ` Roman Mamedov
@ 2015-02-07 23:42 ` Phil Turmel
2015-02-07 23:49 ` Roman Mamedov
2015-02-07 23:43 ` Eyal Lebedinsky
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Phil Turmel @ 2015-02-07 23:42 UTC (permalink / raw)
To: Roman Mamedov, Eyal Lebedinsky; +Cc: linux-raid
Hi Roman,
On 02/07/2015 06:04 PM, Roman Mamedov wrote:
> On Sun, 08 Feb 2015 09:39:47 +1100
> Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
>> I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
>> around the bad area also showed it to be all zeroes.
>
> I wouldn't expect mdadm to have any headers or unused areas as far as 133 MB
> into a RAID member.
Look at mdadm -E for that drive and your partition start sector. I bet
Eyal is right. Latest mdadm gives me a 128MB data offset.
Phil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 23:04 ` Roman Mamedov
2015-02-07 23:42 ` Phil Turmel
@ 2015-02-07 23:43 ` Eyal Lebedinsky
2015-02-07 23:49 ` Eyal Lebedinsky
2015-02-08 17:23 ` John Stoffel
3 siblings, 0 replies; 8+ messages in thread
From: Eyal Lebedinsky @ 2015-02-07 23:43 UTC (permalink / raw)
Cc: linux-raid
On 08/02/15 10:04, Roman Mamedov wrote:
> On Sun, 08 Feb 2015 09:39:47 +1100
> Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
>
>> The error did not show up during normal operation (or during scrub), only during the smartctl long test.
>> What triggered the error for you?
>
> Just appeared during boot-up after a reboot (after 50 days uptime) which was
> performed for some hardware upgrades (RAM, SATA controller). The error doesn't
> go away after swapping the SATA controller for different one.
>
>> I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
>> around the bad area also showed it to be all zeroes.
>
> I wouldn't expect mdadm to have any headers or unused areas as far as 133 MB
> into a RAID member.
Roman,
You may want to read
https://raid.wiki.kernel.org/index.php/RAID_superblock_formats
For me:
# parted -l
...
Number Start End Size File system Name Flags
1 1049kB 4001GB 4001GB
...
# mdadm --examine /dev/sdc1
...
Data Offset : 262144 sectors
...
With required alignment etc. it may reach your 133MB.
cheers
--
Eyal Lebedinsky (eyal@eyal.emu.id.au)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 23:42 ` Phil Turmel
@ 2015-02-07 23:49 ` Roman Mamedov
0 siblings, 0 replies; 8+ messages in thread
From: Roman Mamedov @ 2015-02-07 23:49 UTC (permalink / raw)
To: Phil Turmel; +Cc: Eyal Lebedinsky, linux-raid
On Sat, 07 Feb 2015 18:42:04 -0500
Phil Turmel <philip@turmel.org> wrote:
> Hi Roman,
>
> On 02/07/2015 06:04 PM, Roman Mamedov wrote:
> > On Sun, 08 Feb 2015 09:39:47 +1100
> > Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
>
> >> I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
> >> around the bad area also showed it to be all zeroes.
> >
> > I wouldn't expect mdadm to have any headers or unused areas as far as 133 MB
> > into a RAID member.
>
> Look at mdadm -E for that drive and your partition start sector. I bet
> Eyal is right. Latest mdadm gives me a 128MB data offset.
Oh indeed:
Data Offset : 262144 sectors
The unreadable area was at 260200 sectors.
Thanks
--
With respect,
Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 23:04 ` Roman Mamedov
2015-02-07 23:42 ` Phil Turmel
2015-02-07 23:43 ` Eyal Lebedinsky
@ 2015-02-07 23:49 ` Eyal Lebedinsky
2015-02-08 17:23 ` John Stoffel
3 siblings, 0 replies; 8+ messages in thread
From: Eyal Lebedinsky @ 2015-02-07 23:49 UTC (permalink / raw)
Cc: linux-raid
On 08/02/15 10:04, Roman Mamedov wrote:
> On Sun, 08 Feb 2015 09:39:47 +1100
> Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
>
>> The error did not show up during normal operation (or during scrub), only during the smartctl long test.
>> What triggered the error for you?
>
> Just appeared during boot-up after a reboot (after 50 days uptime) which was
> performed for some hardware upgrades (RAM, SATA controller). The error doesn't
> go away after swapping the SATA controller for different one.
>
>> I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
>> around the bad area also showed it to be all zeroes.
>
> I wouldn't expect mdadm to have any headers or unused areas as far as 133 MB
> into a RAID member.
>
You can also read the earlier discussion starting 26/Feb/14:
Subject: how to handle bad sectors in md control areas?
cheers
--
Eyal Lebedinsky (eyal@eyal.emu.id.au)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md resync ignoring unreadable sectors
2015-02-07 23:04 ` Roman Mamedov
` (2 preceding siblings ...)
2015-02-07 23:49 ` Eyal Lebedinsky
@ 2015-02-08 17:23 ` John Stoffel
3 siblings, 0 replies; 8+ messages in thread
From: John Stoffel @ 2015-02-08 17:23 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Eyal Lebedinsky, linux-raid
>>>>> "Roman" == Roman Mamedov <rm@romanrm.net> writes:
Roman> On Sun, 08 Feb 2015 09:39:47 +1100
Roman> Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote:
>> The error did not show up during normal operation (or during scrub), only during the smartctl long test.
>> What triggered the error for you?
Roman> Just appeared during boot-up after a reboot (after 50 days uptime) which was
Roman> performed for some hardware upgrades (RAM, SATA controller). The error doesn't
Roman> go away after swapping the SATA controller for different one.
>> I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors
>> around the bad area also showed it to be all zeroes.
Roman> I wouldn't expect mdadm to have any headers or unused areas as
Roman> far as 133 MB into a RAID member.
Roman,
I would immediately add in a third RAID1 member, wait for it to
resync, then pull out the bad drive and write zeros to the entire
drive to force any and all bad sectors to get over-written and
hopefully reallocated from good sectors.
But I'd also treat the drive as suspect and replace it ASAP. Keet it
around as a scratch drive, or a temp space area you don't care about
if you like, but not of important data if at all possible.
John
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-02-08 17:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-07 21:47 md resync ignoring unreadable sectors Roman Mamedov
2015-02-07 22:39 ` Eyal Lebedinsky
2015-02-07 23:04 ` Roman Mamedov
2015-02-07 23:42 ` Phil Turmel
2015-02-07 23:49 ` Roman Mamedov
2015-02-07 23:43 ` Eyal Lebedinsky
2015-02-07 23:49 ` Eyal Lebedinsky
2015-02-08 17:23 ` John Stoffel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).