All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Wol's lists <antlists@youngman.org.uk>
Cc: Phil Turmel <philip@turmel.org>,
	Kay Diederichs <kay.diederichs@uni-konstanz.de>,
	Andreas Klauer <Andreas.Klauer@metamorpher.de>,
	Adam Goryachev <mailinglists@websitemanagers.com.au>,
	Roger Heflin <rogerheflin@gmail.com>,
	linux-raid@vger.kernel.org
Subject: Re: force remapping a pending sector in sw raid5 array
Date: Fri, 9 Feb 2018 14:36:14 -0800	[thread overview]
Message-ID: <20180209223613.GO9565@merlins.org> (raw)
In-Reply-To: <3345d6f0-80d0-bfaf-9974-a7472d499117@youngman.org.uk>

On Fri, Feb 09, 2018 at 10:07:57PM +0000, Wol's lists wrote:
> On 09/02/18 21:22, Marc MERLIN wrote:
> >Interesting. I figured once a sector went pending once, it would not 
> >actually be re-used and
> >be remapped on the next write. Seems like it didn't happen here.
> 
> Because there's all sorts of reasons a sector can go pending.
> 
> My favourite example is to compare it to DRAM. DRAM needs refreshing 
> every couple of seconds, otherwise it loses its contents and cannot be 
> read, but it's perfectly okay to rewrite and re-use it.
 
You're correct. The density of drives is so high now that writing a block
affects the ones around it.

> Likewise, the magnetism in a drive can decay such that the data is 
> unreadable, but there's nothing actually wrong with the drive. (If the 
> data next door is repeatedly rewritten, the rewrite can "leak" and trash 
> nearby data ...) The decay time for that should be years.

Right. That's why I'm unhappy that it happened within a week of unpacking
the drives and 2 out of 5 had problems already.

> The problem of course is when the problem has a decay time measured in 
> minutes or hours. The rewrite succeeds, so the sector doesn't get 
> remapped, but when you next read it it has died :-(

Speaking of this, I still haven't gotten the drive to actually remap
anything yet.
On that 2nd drive, I'm seeing 7 pending sectors, and can't trigger any error
or remapping on them:
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always -       7

# 1  Short offline       Completed: read failure       90%       519         569442000
# 2  Short offline       Completed: read failure       90%       519         569442000
# 3  Extended offline    Completed: read failure       90%       518         569442000
# 4  Short offline       Completed without error       00%       508         -
# 5  Short offline       Completed without error       00%       484         -
# 6  Short offline       Completed without error       00%       460         -
# 7  Short offline       Completed without error       00%       436         -
# 8  Short offline       Completed: read failure       90%       413         569441985
# 9  Extended offline    Completed: read failure       90%       409         569441990
#10  Extended offline    Completed: read failure       90%       409         569441985
#11  Extended offline    Completed: read failure       90%       409         569441991
#12  Extended offline    Completed: read failure       90%       409         569441985

So, running badblocks over that range should help, right?

But no, I get nothing:
myth:~# badblocks -fsvn -b512 /dev/sdf  569942000 569001000
/dev/sdf is apparently in use by the system; badblocks forced anyway.
Checking for bad blocks in non-destructive read-write mode
From block 569001000 to 569942000
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

In some way, unless I'm reading the wrong blocks, that would mean the blocks are good again?

But smart still shows 
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7

and a short offline test immediately shows
# 1  Short offline       Completed: read failure       90%       519         569442000

Clearly, I still have some things to learn.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

  reply	other threads:[~2018-02-09 22:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-06 18:14 force remapping a pending sector in sw raid5 array Marc MERLIN
2018-02-06 18:59 ` Reindl Harald
2018-02-06 19:36   ` Marc MERLIN
2018-02-06 20:03 ` Andreas Klauer
2018-02-06 21:51 ` Adam Goryachev
2018-02-06 22:02   ` Marc MERLIN
2018-02-06 22:31     ` Roger Heflin
2018-02-06 22:46       ` Marc MERLIN
2018-02-07  4:29   ` Marc MERLIN
2018-02-07  9:42 ` Kay Diederichs
2018-02-09 19:29   ` Marc MERLIN
2018-02-09 19:57     ` Kay Diederichs
2018-02-09 20:02     ` Roger Heflin
2018-02-09 20:13     ` Phil Turmel
2018-02-09 20:29       ` Marc MERLIN
2018-02-09 20:44         ` Phil Turmel
2018-02-09 21:22           ` Marc MERLIN
2018-02-09 22:07             ` Wol's lists
2018-02-09 22:36               ` Marc MERLIN [this message]
2018-02-09 20:52         ` Kay Diederichs
2018-02-11 20:52           ` Roger Heflin
2018-02-09 21:17         ` Kay Diederichs
2018-02-10 21:43       ` Mateusz Korniak
2018-02-11 15:41         ` Marc MERLIN
2018-02-11 16:41           ` Marc MERLIN
2018-02-11 17:13         ` Phil Turmel
2018-02-11 18:02           ` Wols Lists
2018-02-12 10:43           ` Mateusz Korniak
2018-02-12 15:29             ` Phil Turmel
2018-02-12 16:49               ` Marc MERLIN
2018-02-12 17:16                 ` Phil Turmel
2018-02-12 17:30                   ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180209223613.GO9565@merlins.org \
    --to=marc@merlins.org \
    --cc=Andreas.Klauer@metamorpher.de \
    --cc=antlists@youngman.org.uk \
    --cc=kay.diederichs@uni-konstanz.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=mailinglists@websitemanagers.com.au \
    --cc=philip@turmel.org \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.