Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Kay Diederichs <kay.diederichs@uni-konstanz.de>
To: Marc MERLIN <marc@merlins.org>, Phil Turmel <philip@turmel.org>
Cc: Andreas Klauer <Andreas.Klauer@metamorpher.de>,
	Adam Goryachev <mailinglists@websitemanagers.com.au>,
	Roger Heflin <rogerheflin@gmail.com>,
	linux-raid@vger.kernel.org
Subject: Re: force remapping a pending sector in sw raid5 array
Date: Fri, 9 Feb 2018 21:52:38 +0100	[thread overview]
Message-ID: <59144df0-35b7-942f-22c8-754afd0f89c4@uni-konstanz.de> (raw)
In-Reply-To: <20180209202958.6mieeomu5of45rjf@merlins.org>

[-- Attachment #1: Type: text/plain, Size: 9361 bytes --]



Am 09/02/18 um 21:29 schrieb Marc MERLIN:
> On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
>>> The pending sectors should have been re-written and become
>>> Reallocated_Event_Count, no?
>>
>> Yes, and not necessarily.  Pending sectors can be non-permanent errors
>> -- the drive firmware will test a pending sector immediately after write
>> to see if the write is readable.  If not, it will re-allocate while it
>> still has the write data in its buffers.  Otherwise, it'll clear the
>> pending sector.
> 
> This shows the sector is still bad though, right? 
> 
> myth:~# hdparm --read-sector 1287409520 /dev/sdh
> /dev/sdh:
> reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
> 7000 0b54 92c4 ffff 0000 0000 01fe 0000
> (...)
> 
> [ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
> [ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
> [ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
> [ 2572.139431] ata5.04: status: { DRDY ERR }
> [ 2572.139435] ata5.04: error: { UNC }
> [ 2572.162369] ata5.04: configured for UDMA/133
> [ 2572.162414] ata5: EH complete
> 
> mdadm also said it found 6 bad sectors and rewrote them (or something like that)
> and it's happy. So alledgely it did something, but smart does not agree (yet?)
> 
> I'm now running a long smart test on all drives, will see if numbers change.
> 
> Mmmh, and I just ran 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> below, and I don't quite understand what's going on.
> 
>>> So, mdadm is happy allegedly, but my drives still have the same bad
>>> sectors they had (more or less).
>>
>> If you have bad block lists enabled in your array, MD will *never* try
>> to fix the underlying sectors.  Please show your mdadm -E reports for
>> these devices.  If necessary, stop the array and re-assemble with the
>> options to disable bad block lists.  { How this misfeature got into the
>> kernel and enabled by default baffles me. }
> 
> This means I dont have bad block lists?
> myth:~# mdadm -E /dev/sdd e f g h all return
> /dev/sdd:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
>> Also, pending sectors that are in dead zones between metadata and array
>> data will not be accessed by a check scrub, and will therefore persist.
>  
> That's a good point, but then I would never have discovered those blocks
> while initializing the array.
> 
>>> Yes, I know I should trash (return) those drives,
>>
>> Well, non-permanent read errors are not considered warranty failures.
>> They are in the drive specs.  When pending is zero and actual
>> re-allocations are climbing (my threshold is double digits), *then* it's
>> time to replace.
> 
> I think it's worse here. Read errors are not being cleared by block rewrites?
> Those are brand "new" (but really remanufactured) drives. 
> So far I'm not liking what I'm seeing and I'm very close to just
> returning them all and getting some less dodgy ones.
> 
> Sad because the last set of 5 I got from a similar source, have worked
> beautifully.
> 
> Let's see what a full smart scan does.
> I may also use hdparm --write-sector to just fill those bad blocks with 0's
> now that it seems that mdadm isn't caring about/using them anymore?
> 
> Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?
> 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> /dev/sdh is apparently in use by the system; badblocks forced anyway.
> Checking for bad blocks in non-destructive read-write mode
> From block 1287409400 to 1287409599
> Checking for bad blocks (non-destructive read-write test)
> Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
> 1287409521ne, 0:18 elapsed. (1/0/0 errors)
> 1287409522ne, 0:23 elapsed. (2/0/0 errors)
> 1287409523ne, 0:27 elapsed. (3/0/0 errors)
> 1287409524ne, 0:31 elapsed. (4/0/0 errors)
> 1287409525ne, 0:36 elapsed. (5/0/0 errors)
> 1287409526ne, 0:40 elapsed. (6/0/0 errors)
> 1287409527ne, 0:44 elapsed. (7/0/0 errors)
> done                                                 
> Pass completed, 8 bad blocks found. (8/0/0 errors)
> 
> Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
> succeeded but that did nothing anyway?
> 
> Do I understand that
> 1) badblocks got read errors
> 2) it's supposed to rewrite the blocks with new data (or not?)
> 3) auto reallocate failed
> 
> 
> [ 3171.717001] ata5.04: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
> [ 3171.717012] ata5.04: failed command: READ FPDMA QUEUED 
> [ 3171.717019] ata5.04: cmd 60/08:30:70:4f:bc/00:00:4c:00:00/40 tag 6 ncq dma 4096 in
> [ 3171.717019]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3171.717031] ata5.04: status: { DRDY ERR } 
> [ 3171.717034] ata5.04: error: { UNC }
> [ 3171.718293] ata5.04: configured for UDMA/133
> [ 3171.718342] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3171.718349] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
> [ 3171.718354] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed 
> [ 3171.718360] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3171.718364] print_req_error: I/O error, dev sdh, sector 1287409520 
> [ 3171.718369] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3171.718393] ata5: EH complete
> [ 3176.092946] ata5.04: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x0
> [ 3176.092958] ata5.04: failed command: READ FPDMA QUEUED
> [ 3176.092973] ata5.04: cmd 60/08:b0:70:4f:bc/00:00:4c:00:00/40 tag 22 ncq dma 4096 in 
> [ 3176.092973]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3176.092978] ata5.04: status: { DRDY ERR }
> [ 3176.092981] ata5.04: error: { UNC } 
> [ 3176.094237] ata5.04: configured for UDMA/133
> [ 3176.094285] sd 4:4:0:0: [sdh] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
> [ 3176.094291] sd 4:4:0:0: [sdh] tag#22 Sense Key : Medium Error [current] 
> [ 3176.094296] sd 4:4:0:0: [sdh] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3176.094302] sd 4:4:0:0: [sdh] tag#22 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00 
> [ 3176.094306] print_req_error: I/O error, dev sdh, sector 1287409520
> [ 3176.094310] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3176.094324] ata5: EH complete
> [ 3180.488899] ata5.04: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
> [ 3180.488909] ata5.04: failed command: READ FPDMA QUEUED 
> [ 3180.488916] ata5.04: cmd 60/08:40:70:4f:bc/00:00:4c:00:00/40 tag 8 ncq dma 4096 in
> [ 3180.488916]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F> 
> [ 3180.488928] ata5.04: status: { DRDY ERR }
> [ 3180.488931] ata5.04: error: { UNC }
> [ 3180.490193] ata5.04: configured for UDMA/133
> [ 3180.490243] sd 4:4:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3180.490249] sd 4:4:0:0: [sdh] tag#8 Sense Key : Medium Error [current]  
> [ 3180.490254] sd 4:4:0:0: [sdh] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3180.490259] sd 4:4:0:0: [sdh] tag#8 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3180.490263] print_req_error: I/O error, dev sdh, sector 1287409520 
> [ 3180.490268] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3180.490290] ata5: EH complete 
> [ 3184.873146] ata5.04: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
> [ 3184.873161] ata5.04: failed command: READ FPDMA QUEUED
> [ 3184.873175] ata5.04: cmd 60/08:c0:70:4f:bc/00:00:4c:00:00/40 tag 24 ncq dma 4096 in 
> [ 3184.873175]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3184.873181] ata5.04: status: { DRDY ERR }
> [ 3184.873184] ata5.04: error: { UNC }
> [ 3184.874437] ata5.04: configured for UDMA/133
> [ 3184.874488] sd 4:4:0:0: [sdh] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3184.874495] sd 4:4:0:0: [sdh] tag#24 Sense Key : Medium Error [current] 
> [ 3184.874500] sd 4:4:0:0: [sdh] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3184.874506] sd 4:4:0:0: [sdh] tag#24 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3184.874510] print_req_error: I/O error, dev sdh, sector 1287409520
> [ 3184.874515] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3184.874555] ata5: EH complete
> 

What you write about the result of
badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
is the expected behavior. -n means that it will _not_ write sectors that
it cannot read (because that would remove the possibility that data from
these sectors could be recovered by more tries).

As I wrote, you have to use the -w option instead of -n, and use x and y
of 1287409527 1287409520

HTH
Kay



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5049 bytes --]

  parent reply	other threads:[~2018-02-09 20:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
2018-02-06 18:59 ` Reindl Harald
2018-02-06 19:36   ` Marc MERLIN
2018-02-06 20:03 ` Andreas Klauer
2018-02-06 21:51 ` Adam Goryachev
2018-02-06 22:02   ` Marc MERLIN
2018-02-06 22:31     ` Roger Heflin
2018-02-06 22:46       ` Marc MERLIN
2018-02-07  4:29   ` Marc MERLIN
2018-02-07  9:42 ` Kay Diederichs
2018-02-09 19:29   ` Marc MERLIN
2018-02-09 19:57     ` Kay Diederichs
2018-02-09 20:02     ` Roger Heflin
2018-02-09 20:13     ` Phil Turmel
2018-02-09 20:29       ` Marc MERLIN
2018-02-09 20:44         ` Phil Turmel
2018-02-09 21:22           ` Marc MERLIN
2018-02-09 22:07             ` Wol's lists
2018-02-09 22:36               ` Marc MERLIN
2018-02-09 20:52         ` Kay Diederichs [this message]
2018-02-11 20:52           ` Roger Heflin
2018-02-09 21:17         ` Kay Diederichs
2018-02-10 21:43       ` Mateusz Korniak
2018-02-11 15:41         ` Marc MERLIN
2018-02-11 16:41           ` Marc MERLIN
2018-02-11 17:13         ` Phil Turmel
2018-02-11 18:02           ` Wols Lists
2018-02-12 10:43           ` Mateusz Korniak
2018-02-12 15:29             ` Phil Turmel
2018-02-12 16:49               ` Marc MERLIN
2018-02-12 17:16                 ` Phil Turmel
2018-02-12 17:30                   ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59144df0-35b7-942f-22c8-754afd0f89c4@uni-konstanz.de \
    --to=kay.diederichs@uni-konstanz.de \
    --cc=Andreas.Klauer@metamorpher.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=mailinglists@websitemanagers.com.au \
    --cc=marc@merlins.org \
    --cc=philip@turmel.org \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox