Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Phil Turmel <philip@turmel.org>
Cc: Kay Diederichs <kay.diederichs@uni-konstanz.de>,
	Andreas Klauer <Andreas.Klauer@metamorpher.de>,
	Adam Goryachev <mailinglists@websitemanagers.com.au>,
	Roger Heflin <rogerheflin@gmail.com>,
	linux-raid@vger.kernel.org
Subject: Re: force remapping a pending sector in sw raid5 array
Date: Fri, 9 Feb 2018 12:29:58 -0800	[thread overview]
Message-ID: <20180209202958.6mieeomu5of45rjf@merlins.org> (raw)
In-Reply-To: <1227ce39-31af-22f2-f4fa-de85466f05c7@turmel.org>

On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
> > The pending sectors should have been re-written and become
> > Reallocated_Event_Count, no?
> 
> Yes, and not necessarily.  Pending sectors can be non-permanent errors
> -- the drive firmware will test a pending sector immediately after write
> to see if the write is readable.  If not, it will re-allocate while it
> still has the write data in its buffers.  Otherwise, it'll clear the
> pending sector.

This shows the sector is still bad though, right? 

myth:~# hdparm --read-sector 1287409520 /dev/sdh
/dev/sdh:
reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
7000 0b54 92c4 ffff 0000 0000 01fe 0000
(...)

[ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
[ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
[ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
[ 2572.139431] ata5.04: status: { DRDY ERR }
[ 2572.139435] ata5.04: error: { UNC }
[ 2572.162369] ata5.04: configured for UDMA/133
[ 2572.162414] ata5: EH complete

mdadm also said it found 6 bad sectors and rewrote them (or something like that)
and it's happy. So alledgely it did something, but smart does not agree (yet?)

I'm now running a long smart test on all drives, will see if numbers change.

Mmmh, and I just ran 
myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
below, and I don't quite understand what's going on.

> > So, mdadm is happy allegedly, but my drives still have the same bad
> > sectors they had (more or less).
> 
> If you have bad block lists enabled in your array, MD will *never* try
> to fix the underlying sectors.  Please show your mdadm -E reports for
> these devices.  If necessary, stop the array and re-assemble with the
> options to disable bad block lists.  { How this misfeature got into the
> kernel and enabled by default baffles me. }

This means I dont have bad block lists?
myth:~# mdadm -E /dev/sdd e f g h all return
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

> Also, pending sectors that are in dead zones between metadata and array
> data will not be accessed by a check scrub, and will therefore persist.
 
That's a good point, but then I would never have discovered those blocks
while initializing the array.

> > Yes, I know I should trash (return) those drives,
> 
> Well, non-permanent read errors are not considered warranty failures.
> They are in the drive specs.  When pending is zero and actual
> re-allocations are climbing (my threshold is double digits), *then* it's
> time to replace.

I think it's worse here. Read errors are not being cleared by block rewrites?
Those are brand "new" (but really remanufactured) drives. 
So far I'm not liking what I'm seeing and I'm very close to just
returning them all and getting some less dodgy ones.

Sad because the last set of 5 I got from a similar source, have worked
beautifully.

Let's see what a full smart scan does.
I may also use hdparm --write-sector to just fill those bad blocks with 0's
now that it seems that mdadm isn't caring about/using them anymore?

Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?

myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
/dev/sdh is apparently in use by the system; badblocks forced anyway.
Checking for bad blocks in non-destructive read-write mode
From block 1287409400 to 1287409599
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
1287409521ne, 0:18 elapsed. (1/0/0 errors)
1287409522ne, 0:23 elapsed. (2/0/0 errors)
1287409523ne, 0:27 elapsed. (3/0/0 errors)
1287409524ne, 0:31 elapsed. (4/0/0 errors)
1287409525ne, 0:36 elapsed. (5/0/0 errors)
1287409526ne, 0:40 elapsed. (6/0/0 errors)
1287409527ne, 0:44 elapsed. (7/0/0 errors)
done                                                 
Pass completed, 8 bad blocks found. (8/0/0 errors)

Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
succeeded but that did nothing anyway?

Do I understand that
1) badblocks got read errors
2) it's supposed to rewrite the blocks with new data (or not?)
3) auto reallocate failed


[ 3171.717001] ata5.04: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
[ 3171.717012] ata5.04: failed command: READ FPDMA QUEUED 
[ 3171.717019] ata5.04: cmd 60/08:30:70:4f:bc/00:00:4c:00:00/40 tag 6 ncq dma 4096 in
[ 3171.717019]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
[ 3171.717031] ata5.04: status: { DRDY ERR } 
[ 3171.717034] ata5.04: error: { UNC }
[ 3171.718293] ata5.04: configured for UDMA/133
[ 3171.718342] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3171.718349] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
[ 3171.718354] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed 
[ 3171.718360] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
[ 3171.718364] print_req_error: I/O error, dev sdh, sector 1287409520 
[ 3171.718369] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3171.718393] ata5: EH complete
[ 3176.092946] ata5.04: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x0
[ 3176.092958] ata5.04: failed command: READ FPDMA QUEUED
[ 3176.092973] ata5.04: cmd 60/08:b0:70:4f:bc/00:00:4c:00:00/40 tag 22 ncq dma 4096 in 
[ 3176.092973]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
[ 3176.092978] ata5.04: status: { DRDY ERR }
[ 3176.092981] ata5.04: error: { UNC } 
[ 3176.094237] ata5.04: configured for UDMA/133
[ 3176.094285] sd 4:4:0:0: [sdh] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
[ 3176.094291] sd 4:4:0:0: [sdh] tag#22 Sense Key : Medium Error [current] 
[ 3176.094296] sd 4:4:0:0: [sdh] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
[ 3176.094302] sd 4:4:0:0: [sdh] tag#22 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00 
[ 3176.094306] print_req_error: I/O error, dev sdh, sector 1287409520
[ 3176.094310] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3176.094324] ata5: EH complete
[ 3180.488899] ata5.04: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
[ 3180.488909] ata5.04: failed command: READ FPDMA QUEUED 
[ 3180.488916] ata5.04: cmd 60/08:40:70:4f:bc/00:00:4c:00:00/40 tag 8 ncq dma 4096 in
[ 3180.488916]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F> 
[ 3180.488928] ata5.04: status: { DRDY ERR }
[ 3180.488931] ata5.04: error: { UNC }
[ 3180.490193] ata5.04: configured for UDMA/133
[ 3180.490243] sd 4:4:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3180.490249] sd 4:4:0:0: [sdh] tag#8 Sense Key : Medium Error [current]  
[ 3180.490254] sd 4:4:0:0: [sdh] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
[ 3180.490259] sd 4:4:0:0: [sdh] tag#8 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
[ 3180.490263] print_req_error: I/O error, dev sdh, sector 1287409520 
[ 3180.490268] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3180.490290] ata5: EH complete 
[ 3184.873146] ata5.04: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
[ 3184.873161] ata5.04: failed command: READ FPDMA QUEUED
[ 3184.873175] ata5.04: cmd 60/08:c0:70:4f:bc/00:00:4c:00:00/40 tag 24 ncq dma 4096 in 
[ 3184.873175]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
[ 3184.873181] ata5.04: status: { DRDY ERR }
[ 3184.873184] ata5.04: error: { UNC }
[ 3184.874437] ata5.04: configured for UDMA/133
[ 3184.874488] sd 4:4:0:0: [sdh] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3184.874495] sd 4:4:0:0: [sdh] tag#24 Sense Key : Medium Error [current] 
[ 3184.874500] sd 4:4:0:0: [sdh] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
[ 3184.874506] sd 4:4:0:0: [sdh] tag#24 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
[ 3184.874510] print_req_error: I/O error, dev sdh, sector 1287409520
[ 3184.874515] Buffer I/O error on dev sdh, logical block 160926190, async page read
[ 3184.874555] ata5: EH complete

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

  reply	other threads:[~2018-02-09 20:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
2018-02-06 18:59 ` Reindl Harald
2018-02-06 19:36   ` Marc MERLIN
2018-02-06 20:03 ` Andreas Klauer
2018-02-06 21:51 ` Adam Goryachev
2018-02-06 22:02   ` Marc MERLIN
2018-02-06 22:31     ` Roger Heflin
2018-02-06 22:46       ` Marc MERLIN
2018-02-07  4:29   ` Marc MERLIN
2018-02-07  9:42 ` Kay Diederichs
2018-02-09 19:29   ` Marc MERLIN
2018-02-09 19:57     ` Kay Diederichs
2018-02-09 20:02     ` Roger Heflin
2018-02-09 20:13     ` Phil Turmel
2018-02-09 20:29       ` Marc MERLIN [this message]
2018-02-09 20:44         ` Phil Turmel
2018-02-09 21:22           ` Marc MERLIN
2018-02-09 22:07             ` Wol's lists
2018-02-09 22:36               ` Marc MERLIN
2018-02-09 20:52         ` Kay Diederichs
2018-02-11 20:52           ` Roger Heflin
2018-02-09 21:17         ` Kay Diederichs
2018-02-10 21:43       ` Mateusz Korniak
2018-02-11 15:41         ` Marc MERLIN
2018-02-11 16:41           ` Marc MERLIN
2018-02-11 17:13         ` Phil Turmel
2018-02-11 18:02           ` Wols Lists
2018-02-12 10:43           ` Mateusz Korniak
2018-02-12 15:29             ` Phil Turmel
2018-02-12 16:49               ` Marc MERLIN
2018-02-12 17:16                 ` Phil Turmel
2018-02-12 17:30                   ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180209202958.6mieeomu5of45rjf@merlins.org \
    --to=marc@merlins.org \
    --cc=Andreas.Klauer@metamorpher.de \
    --cc=kay.diederichs@uni-konstanz.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=mailinglists@websitemanagers.com.au \
    --cc=philip@turmel.org \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox