Re-map disk sectors in userspace when rewriting after read errors

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re-map disk sectors in userspace when rewriting after read errors
@ 2009-09-15  6:23 Matthias Urlichs
  2009-09-15  6:45 ` berk walker
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15  6:23 UTC (permalink / raw)
  To: linux-raid

Hi,

my problem is that I have a bunch of crappy disks which seem unable to
reliably remap bad areas after a read error.

This obviously makes the read error rewrite feature of our beloved 
RAID5/6 code somewhat less than useful.

What I would like to do is to re-map these sectors in userspace -- either 
by browbeating the disk into it, or by using the Device Mapper. So I'd 
need a way to tell a userspace daemon "this device+block is unreadable", 
and wait until said daemon tells the RAID core to go ahead.

I can do the userspace side easily, but my time to dig through the RAID 
code and implement that sort of channel in a maintainable way is somewhat 
limited. (Plus, I need that code sooner rather than later.)

Would somebody be able to help out? There may be some money in it ...

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs
@ 2009-09-15  6:45 ` berk walker
  2009-09-15  7:23   ` Matthias Urlichs
  2009-09-15  7:13 ` Alex Butcher
  2009-09-15 10:40 ` Majed B.
  2 siblings, 1 reply; 33+ messages in thread
From: berk walker @ 2009-09-15  6:45 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

Matthias Urlichs wrote:
> Hi,
>
> my problem is that I have a bunch of crappy disks which seem unable to
> reliably remap bad areas after a read error.
>
> This obviously makes the read error rewrite feature of our beloved 
> RAID5/6 code somewhat less than useful.
>
> What I would like to do is to re-map these sectors in userspace -- either 
> by browbeating the disk into it, or by using the Device Mapper. So I'd 
> need a way to tell a userspace daemon "this device+block is unreadable", 
> and wait until said daemon tells the RAID core to go ahead.
>
> I can do the userspace side easily, but my time to dig through the RAID 
> code and implement that sort of channel in a maintainable way is somewhat 
> limited. (Plus, I need that code sooner rather than later.)
>
> Would somebody be able to help out? There may be some money in it ...
>
>   
I can not believe the question.  What file system might this be?




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs
  2009-09-15  6:45 ` berk walker
@ 2009-09-15  7:13 ` Alex Butcher
  2009-09-15  7:29   ` Matthias Urlichs
  2009-09-15 10:40 ` Majed B.
  2 siblings, 1 reply; 33+ messages in thread
From: Alex Butcher @ 2009-09-15  7:13 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

On Tue, 15 Sep 2009, Matthias Urlichs wrote:

> my problem is that I have a bunch of crappy disks which seem unable to
> reliably remap bad areas after a read error.

IME, discs don't remap after read errors, only on writes.

> This obviously makes the read error rewrite feature of our beloved
> RAID5/6 code somewhat less than useful.

Are you sure that refresh-writes triggered by read errors are expected
behaviour of md's RAID5/6 mode? Only they weren't for RAID1 until somewhat
recently (2.6.15, IIRC).

Best Regards,
Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  6:45 ` berk walker
@ 2009-09-15  7:23   ` Matthias Urlichs
  0 siblings, 0 replies; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15  7:23 UTC (permalink / raw)
  To: linux-raid

On Tue, 15 Sep 2009 02:45:29 -0400, berk walker wrote:

> I can not believe the question.  What file system might this be?

Umm, what's your problem with my question?

And why would it matter which file system I'm using?

_My_ problem is that I have a bunch of disks which are not as reliable as 
I'd like. Yes I could go and buy a new heap of 1TB disks, but frankly I'd 
like to avoid that. These disks are "good enough" for the data that's on 
them. I'll replace one if it fails entirely -- assuming that I can 
rebuild the RAID6 array when I do that. However, since the rewrite-after-
read code has caused bad sectors to accumulate on all of these disks, I 
can't even do that at the moment.

(And, since there's no command which knows how to recover bad spots from 
the other RAID disks yet (I hope to be able to work on _that_ problem 
next week), I can't even use ddrescue to copy one almost-good disk to a 
new one.)

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  7:13 ` Alex Butcher
@ 2009-09-15  7:29   ` Matthias Urlichs
  2009-09-15  7:37     ` Alex Butcher
  0 siblings, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15  7:29 UTC (permalink / raw)
  To: linux-raid

On Tue, 15 Sep 2009 08:13:08 +0100, Alex Butcher wrote:

> IME, discs don't remap after read errors, only on writes.

Some may remap after recoverable read errors. However, the RAID code
does (I assume - see below) rewrite the data -- which the disk happily 
acknowledges  -- only to report the very same error next time that spot's 
being read. :-(

> Are you sure that refresh-writes triggered by read errors are expected
> behaviour of md's RAID5/6 mode?

Not 100%, no -- but recovering the data but otherwise ignoring the error 
(other than increment the error counter) would be a level of foolishness 
I won't assume of the RAID code's authors.

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  7:29   ` Matthias Urlichs
@ 2009-09-15  7:37     ` Alex Butcher
  2009-09-15 10:48       ` Matthias Urlichs
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Butcher @ 2009-09-15  7:37 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

On Tue, 15 Sep 2009, Matthias Urlichs wrote:

> On Tue, 15 Sep 2009 08:13:08 +0100, Alex Butcher wrote:
>
>> IME, discs don't remap after read errors, only on writes.
>
> Some may remap after recoverable read errors. However, the RAID code
> does (I assume - see below) rewrite the data -- which the disk happily
> acknowledges  -- only to report the very same error next time that spot's
> being read. :-(

Odd. If I hadn't observed something similar myself with a 40G Maxtor
(badblocks -w fails, wipe with dd if=/dev/zero, badblocks -w succeeds,
badblocks -w fails again), I wouldn't believe it.  SMART seems to think that
it's nowhere near the reallocated sector count threshold.  The only
conclusion I can come to is that the firmware is trash, or being way too
forgiving of inconsistently-performing spinning media.  Either way, it's not
suitable for data I even care a little bit about.

What does SMART say about reallocated and pending sectors on your disks? If
the reallocated threshold has been crossed, this might be the failure mode,
I guess.

What make/model are they?

>> Are you sure that refresh-writes triggered by read errors are expected
>> behaviour of md's RAID5/6 mode?
>
> Not 100%, no -- but recovering the data but otherwise ignoring the error
> (other than increment the error counter) would be a level of foolishness
> I won't assume of the RAID code's authors.

Well, the RAID code had been in the kernel and was being used in production
systems for quite some time before 2.6.15 came along. It took a BSD user to
point it out and a read through the kernel source for me to believe it...

Cheers,
Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs
  2009-09-15  6:45 ` berk walker
  2009-09-15  7:13 ` Alex Butcher
@ 2009-09-15 10:40 ` Majed B.
  2009-09-15 10:52   ` Matthias Urlichs
  2 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-15 10:40 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

Hello,

I'm facing a similar problem now with 2 disks. The
Current_Pending_Sectors and Offline_Uncorrectable are higher than 100,
on a RAID5. SMART monitoring tools failed to report these after each
test so now I'm battling through...

I'm running the array degraded and yesterday while trying to copy the
data to another array (5.5TB), one disk jumped out of the dodgy array
and caused I/O errors... It won't even resync to another disk beyond
15.6%.

Currently, I'm cloning with dd_rescue and hoping to be able to copy
most of the data, and accept some data loss...

Would anyone suggest a better solution?

P.S.: The disks in question are WD, model: WDC WD10EACS-00ZJB0. I have
other WD disks and they're intact and have zero bad sectors...

On Tue, Sep 15, 2009 at 9:23 AM, Matthias Urlichs <matthias@urlichs.de> wrote:
> Hi,
>
> my problem is that I have a bunch of crappy disks which seem unable to
> reliably remap bad areas after a read error.
>
> This obviously makes the read error rewrite feature of our beloved
> RAID5/6 code somewhat less than useful.
>
> What I would like to do is to re-map these sectors in userspace -- either
> by browbeating the disk into it, or by using the Device Mapper. So I'd
> need a way to tell a userspace daemon "this device+block is unreadable",
> and wait until said daemon tells the RAID core to go ahead.
>
> I can do the userspace side easily, but my time to dig through the RAID
> code and implement that sort of channel in a maintainable way is somewhat
> limited. (Plus, I need that code sooner rather than later.)
>
> Would somebody be able to help out? There may be some money in it ...
>
> --
> Matthias Urlichs
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15  7:37     ` Alex Butcher
@ 2009-09-15 10:48       ` Matthias Urlichs
  2009-09-16  9:41         ` Goswin von Brederlow
  0 siblings, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15 10:48 UTC (permalink / raw)
  To: linux-raid

On Tue, 15 Sep 2009 08:37:52 +0100, Alex Butcher wrote:

> Either way, it's not
> suitable for data I even care a little bit about.

Ordinarily I'd agree with you. In this case, however, the data is mostly 
read-only and on backup media. So I don't really care if the disks fall 
off the edge of a cliff; the data will survive.

I can justify a moderate amount of time working on this, with the 
hardware I have. I can't really justify buying eight new disks.

NB: Please don't dismiss this kind of setup out of hand. I know that 
disks are cheap enough these days that the typical professional user 
won't ever need to worry about not being able to replace hardware which 
behaves like this. However, many people happen to be in a different 
situation. :-/

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 10:40 ` Majed B.
@ 2009-09-15 10:52   ` Matthias Urlichs
  2009-09-15 11:03     ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15 10:52 UTC (permalink / raw)
  To: linux-raid

On Tue, 15 Sep 2009 13:40:44 +0300, Majed B. wrote:

> Would anyone suggest a better solution?

You should tell ddrescue to log which sectors it failed to copy. You can 
then recover the missing data by reading the stuff at that offset from 
the other disks, and XORing the bytes.

I plan to write a program which does that (and which also understands 
RAID1 and RAID6). How long can you survive without your data?

-- 
Matthias Urlichs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 10:52   ` Matthias Urlichs
@ 2009-09-15 11:03     ` Majed B.
  2009-09-15 17:02       ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-15 11:03 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

I've been trying to migrate for 2 weeks. I can wait another 2 ... maybe 3 weeks.

Just to be clear, I'm using dd_rescue, not ddrescue (This is the GNU
one). I read the log option but forgot to use it... now I've wasted
over 20 hours... ugh ... /smacks self

That would be a very useful program for cases like this!

On Tue, Sep 15, 2009 at 1:52 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
> On Tue, 15 Sep 2009 13:40:44 +0300, Majed B. wrote:
>
>> Would anyone suggest a better solution?
>
> You should tell ddrescue to log which sectors it failed to copy. You can
> then recover the missing data by reading the stuff at that offset from
> the other disks, and XORing the bytes.
>
> I plan to write a program which does that (and which also understands
> RAID1 and RAID6). How long can you survive without your data?
>
> --
> Matthias Urlichs
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 11:03     ` Majed B.
@ 2009-09-15 17:02       ` Majed B.
  2009-09-15 18:05         ` Matthias Urlichs
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-15 17:02 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

Matthias,

Out of curiosity, how will you find the sectors/blocks that
reconstruct a certain bad sector? Is the data spread to the same block
number on all disks?

On Tue, Sep 15, 2009 at 2:03 PM, Majed B. <majedb@gmail.com> wrote:
> I've been trying to migrate for 2 weeks. I can wait another 2 ... maybe 3 weeks.
>
> Just to be clear, I'm using dd_rescue, not ddrescue (This is the GNU
> one). I read the log option but forgot to use it... now I've wasted
> over 20 hours... ugh ... /smacks self
>
> That would be a very useful program for cases like this!
>
> On Tue, Sep 15, 2009 at 1:52 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
>> On Tue, 15 Sep 2009 13:40:44 +0300, Majed B. wrote:
>>
>>> Would anyone suggest a better solution?
>>
>> You should tell ddrescue to log which sectors it failed to copy. You can
>> then recover the missing data by reading the stuff at that offset from
>> the other disks, and XORing the bytes.
>>
>> I plan to write a program which does that (and which also understands
>> RAID1 and RAID6). How long can you survive without your data?
>>
>> --
>> Matthias Urlichs
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
>       Majed B.
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 17:02       ` Majed B.
@ 2009-09-15 18:05         ` Matthias Urlichs
  2009-09-15 18:14           ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15 18:05 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Tue, 2009-09-15 at 20:02 +0300, Majed B. wrote:
> Out of curiosity, how will you find the sectors/blocks that
> reconstruct a certain bad sector? Is the data spread to the same block
> number on all disks?

Yes. It's a byte-level operation, actually.

The only part that's moderately tricky is, on RAID6, to determine which
partition the Q drive is. Fortunately, mdadm already contains (almost)
all the necessary logic.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 18:05         ` Matthias Urlichs
@ 2009-09-15 18:14           ` Majed B.
  2009-09-15 18:44             ` Matthias Urlichs
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-15 18:14 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

Hmm, so I guess I'm luckier since I run RAID5? (or not because I have
2 bad disks? :p)

When do you expect to have a working application done, by the way?

On Tue, Sep 15, 2009 at 9:05 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
> On Tue, 2009-09-15 at 20:02 +0300, Majed B. wrote:
>> Out of curiosity, how will you find the sectors/blocks that
>> reconstruct a certain bad sector? Is the data spread to the same block
>> number on all disks?
>
> Yes. It's a byte-level operation, actually.
>
> The only part that's moderately tricky is, on RAID6, to determine which
> partition the Q drive is. Fortunately, mdadm already contains (almost)
> all the necessary logic.
>
>



-- 
       Majed B.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 18:14           ` Majed B.
@ 2009-09-15 18:44             ` Matthias Urlichs
  2009-09-16  9:31               ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15 18:44 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Tue, 2009-09-15 at 21:14 +0300, Majed B. wrote:
> Hmm, so I guess I'm luckier since I run RAID5? (or not because I have
> 2 bad disks? :p)
> 
Well, depends on whether you have two errors in the same sector. If not,
you're going to be lucky.

> When do you expect to have a working application done, by the way?
> 
Hopefully later this week. It'll probably be a patch to mdadm's
development branch of some sort.

Neil: In order to do that, I need to read badblock map files for some
(or all) disks, in GNU dd_rescue's format preferably. Do you have a
preference WRT how to tell mdadm about these?

I tend towards "mdadm --recover 0:foo,2:bar DISK_DEVICE...". This would
tell mdadm that the badblock map for disk 0 is in file 'foo', the map
for disk 2 is in 'bar', and the other disks are supposed to be cleanly
read/writeable.

mdadm would then read RAID info from these devices, make sure it's
consistent (or "mostly consistent" if using --force), read the bad block
map, recover the data that's indicated to be bad and write it to the
partitions in question, and zero out the blocks that are unrecoverable
(and restore P+Q vectors for them).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 18:44             ` Matthias Urlichs
@ 2009-09-16  9:31               ` Majed B.
  2009-09-16  9:44                 ` Matthias Urlichs
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-16  9:31 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

Matthias,

I have a question which would probably sound stupid: If I have a bad
blocks output file from dd_rescue, can I reconstruct a bad sector's
data by reading the same sector from all disks (using dd if=/dev/sdx
of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an
normal XOR operation, write zeros to the bad block to force sector
remap, then dd the XOR output to the said sector?

On Tue, Sep 15, 2009 at 9:44 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
> On Tue, 2009-09-15 at 21:14 +0300, Majed B. wrote:
>> Hmm, so I guess I'm luckier since I run RAID5? (or not because I have
>> 2 bad disks? :p)
>>
> Well, depends on whether you have two errors in the same sector. If not,
> you're going to be lucky.
>
>> When do you expect to have a working application done, by the way?
>>
> Hopefully later this week. It'll probably be a patch to mdadm's
> development branch of some sort.
>
> Neil: In order to do that, I need to read badblock map files for some
> (or all) disks, in GNU dd_rescue's format preferably. Do you have a
> preference WRT how to tell mdadm about these?
>
> I tend towards "mdadm --recover 0:foo,2:bar DISK_DEVICE...". This would
> tell mdadm that the badblock map for disk 0 is in file 'foo', the map
> for disk 2 is in 'bar', and the other disks are supposed to be cleanly
> read/writeable.
>
> mdadm would then read RAID info from these devices, make sure it's
> consistent (or "mostly consistent" if using --force), read the bad block
> map, recover the data that's indicated to be bad and write it to the
> partitions in question, and zero out the blocks that are unrecoverable
> (and restore P+Q vectors for them).
>
>
>



-- 
       Majed B.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-15 10:48       ` Matthias Urlichs
@ 2009-09-16  9:41         ` Goswin von Brederlow
  2009-09-16 13:13           ` Matthias Urlichs
  0 siblings, 1 reply; 33+ messages in thread
From: Goswin von Brederlow @ 2009-09-16  9:41 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

Matthias Urlichs <matthias@urlichs.de> writes:

> On Tue, 15 Sep 2009 08:37:52 +0100, Alex Butcher wrote:
>
>> Either way, it's not
>> suitable for data I even care a little bit about.
>
> Ordinarily I'd agree with you. In this case, however, the data is mostly 
> read-only and on backup media. So I don't really care if the disks fall 
> off the edge of a cliff; the data will survive.
>
> I can justify a moderate amount of time working on this, with the 
> hardware I have. I can't really justify buying eight new disks.
>
> NB: Please don't dismiss this kind of setup out of hand. I know that 
> disks are cheap enough these days that the typical professional user 
> won't ever need to worry about not being able to replace hardware which 
> behaves like this. However, many people happen to be in a different 
> situation. :-/

How about making it re-read repaired blocks so it catches when the
disk didn't remap?

I'm assuming the following happens:

1) disk read fails
2) raid rebuilds the block from parity
3) raid writes block to bad disk
4) disk writes data to the old block and fails to detect a write error
   that would trigger a rempapping
5) re-read of the data succeeds because the data is still in the
   drives disk cache
6) later read of the data fails because nothing was remapped

So you would need to write some repair-check-daemon that remembers
repaired blocks, waits for enough data to have passed through the
drive to flush the disk cache and then retries the block again.
And again and again till it stops giving errors.


Alternatively write a re-map device-mapper target that reserves some
space of the disk and remaps bad blocks itself.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16  9:31               ` Majed B.
@ 2009-09-16  9:44                 ` Matthias Urlichs
  2009-09-16  9:52                   ` Majed B.
  2009-09-16 10:00                   ` Robin Hill
  0 siblings, 2 replies; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-16  9:44 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote:
> I have a question which would probably sound stupid: If I have a bad
> blocks output file from dd_rescue, can I reconstruct a bad sector's
> data by reading the same sector from all disks (using dd if=/dev/sdx
> of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an
> normal XOR operation, write zeros to the bad block to force sector
> remap, then dd the XOR output to the said sector?

Well, of course. Assuming that the disk's sector remap works, which was
my problem, and that we're talking about RAID5.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16  9:44                 ` Matthias Urlichs
@ 2009-09-16  9:52                   ` Majed B.
  2009-09-16 13:05                     ` Alex Butcher
  2009-09-16 10:00                   ` Robin Hill
  1 sibling, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-16  9:52 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

That's good, I guess, but I fell into what seems to be a problem yesterday.

I've mentioned before that I have 8 disks in an array. 7 of which
belong to it (degraded), and one doesn't. That outsider disk had bad
sectors. I wrote zeros to the disk yesterday and both Pending and
Offline counts have been reset, but Reallocation count didn't
increase. I did run an immediate offline smartd test after zeroing the
disk...

Does that make sense?!

On Wed, Sep 16, 2009 at 12:44 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
> On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote:
>> I have a question which would probably sound stupid: If I have a bad
>> blocks output file from dd_rescue, can I reconstruct a bad sector's
>> data by reading the same sector from all disks (using dd if=/dev/sdx
>> of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an
>> normal XOR operation, write zeros to the bad block to force sector
>> remap, then dd the XOR output to the said sector?
>
> Well, of course. Assuming that the disk's sector remap works, which was
> my problem, and that we're talking about RAID5.
>
>
>



-- 
       Majed B.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16  9:44                 ` Matthias Urlichs
  2009-09-16  9:52                   ` Majed B.
@ 2009-09-16 10:00                   ` Robin Hill
  2009-09-16 10:07                     ` Majed B.
  1 sibling, 1 reply; 33+ messages in thread
From: Robin Hill @ 2009-09-16 10:00 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Wed Sep 16, 2009 at 11:44:26AM +0200, Matthias Urlichs wrote:

> On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote:
> > I have a question which would probably sound stupid: If I have a bad
> > blocks output file from dd_rescue, can I reconstruct a bad sector's
> > data by reading the same sector from all disks (using dd if=/dev/sdx
> > of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an
> > normal XOR operation, write zeros to the bad block to force sector
> > remap, then dd the XOR output to the said sector?
> 
> Well, of course. Assuming that the disk's sector remap works, which was
> my problem, and that we're talking about RAID5.
> 
And also assuming that the array starts from the same sector of each
disk.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16 10:00                   ` Robin Hill
@ 2009-09-16 10:07                     ` Majed B.
  0 siblings, 0 replies; 33+ messages in thread
From: Majed B. @ 2009-09-16 10:07 UTC (permalink / raw)
  To: linux-raid

Thank you for the heads up, Robin.

I've just checked and it seems that they do start from the same sector:

/dev/sdg: WDC WD10EADS-65L5B1
/dev/sdh: MAXTOR STM31000340AS
root@Adam:/boot# fdisk -l /dev/sd[g-h]

Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1               1      121601   976760001   fd  Linux raid autodetect

Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1               1      121601   976760001   fd  Linux raid autodetect


There are other disks in the array, but the rest are all WD disks and
have a similar structure to the one above.

On Wed, Sep 16, 2009 at 1:00 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Wed Sep 16, 2009 at 11:44:26AM +0200, Matthias Urlichs wrote:
>
>> On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote:
>> > I have a question which would probably sound stupid: If I have a bad
>> > blocks output file from dd_rescue, can I reconstruct a bad sector's
>> > data by reading the same sector from all disks (using dd if=/dev/sdx
>> > of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an
>> > normal XOR operation, write zeros to the bad block to force sector
>> > remap, then dd the XOR output to the said sector?
>>
>> Well, of course. Assuming that the disk's sector remap works, which was
>> my problem, and that we're talking about RAID5.
>>
> And also assuming that the array starts from the same sector of each
> disk.
>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16  9:52                   ` Majed B.
@ 2009-09-16 13:05                     ` Alex Butcher
  0 siblings, 0 replies; 33+ messages in thread
From: Alex Butcher @ 2009-09-16 13:05 UTC (permalink / raw)
  To: Majed B.; +Cc: Matthias Urlichs, linux-raid

On Wed, 16 Sep 2009, Majed B. wrote:

> I wrote zeros to the disk yesterday and both Pending and Offline counts
> have been reset, but Reallocation count didn't increase.

Soft, rather than hard errors, presumably. These can occur if a drive is
writing when power is unexpectedly removed.

HTH,
Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16  9:41         ` Goswin von Brederlow
@ 2009-09-16 13:13           ` Matthias Urlichs
  2009-09-18  8:17             ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-16 13:13 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid

On Wed, 2009-09-16 at 11:41 +0200, Goswin von Brederlow wrote:
> Alternatively write a re-map device-mapper target that reserves some
> space of the disk and remaps bad blocks itself.
> 
That'd require some place to store the mapping so that the whole thing
still works after a reboot. Which should probably be on a different
disk. 

I tend to want to move (part of) that problem to userspace; you may want
to do more than a simple remapping of a few blocks when that happens
(e.g. test-reading the surrounding area).


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-16 13:13           ` Matthias Urlichs
@ 2009-09-18  8:17             ` Majed B.
  2009-09-18  8:28               ` Robin Hill
  2009-09-18 11:35               ` Matthias Urlichs
  0 siblings, 2 replies; 33+ messages in thread
From: Majed B. @ 2009-09-18  8:17 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: linux-raid

I've re-read this thread and I was wondering if: echo check >
/sys/block/$array/md/sync_action would help me (and possibly Matthias)
in any way.

I have a RAID5 array of 8 disks running degraded on 7. One of the 7
has bad sectors and the one that is not in the array also had bad
sectors.

I zeroed the one out of the array (with dd) and then cloned the one
with bad sectors in the array to it using dd_rescue.

Later, I reassembled the array using the cloned disk instead of the original.

So now, I'm sure I still have inconsistencies, but would doing the
action above force a correction? Also, would that work on a degraded
array?

Thank you.

On Wed, Sep 16, 2009 at 4:13 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
> On Wed, 2009-09-16 at 11:41 +0200, Goswin von Brederlow wrote:
>> Alternatively write a re-map device-mapper target that reserves some
>> space of the disk and remaps bad blocks itself.
>>
> That'd require some place to store the mapping so that the whole thing
> still works after a reboot. Which should probably be on a different
> disk.
>
> I tend to want to move (part of) that problem to userspace; you may want
> to do more than a simple remapping of a few blocks when that happens
> (e.g. test-reading the surrounding area).
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18  8:17             ` Majed B.
@ 2009-09-18  8:28               ` Robin Hill
  2009-09-18  9:57                 ` Majed B.
  2009-09-18 11:35               ` Matthias Urlichs
  1 sibling, 1 reply; 33+ messages in thread
From: Robin Hill @ 2009-09-18  8:28 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

On Fri Sep 18, 2009 at 11:17:27AM +0300, Majed B. wrote:

> I've re-read this thread and I was wondering if: echo check >
> /sys/block/$array/md/sync_action would help me (and possibly Matthias)
> in any way.
> 
> I have a RAID5 array of 8 disks running degraded on 7. One of the 7
> has bad sectors and the one that is not in the array also had bad
> sectors.
> 
> I zeroed the one out of the array (with dd) and then cloned the one
> with bad sectors in the array to it using dd_rescue.
> 
> Later, I reassembled the array using the cloned disk instead of the original.
> 
> So now, I'm sure I still have inconsistencies, but would doing the
> action above force a correction? Also, would that work on a degraded
> array?
> 
All the 'check' action does is validate that the checksum matches the
data.  By doing this, it will also be doing a full read check on the
array (though I'm not certain what action is taken on read failures).
The 'repair' action will also rewrite any checksums which don't match
the data.

All of this requires a non-degraded array, so I suspect the 'check' and
'repair' actions will get ignored altogether on a degraded array (and
certainly won't actually work).  As the array is degraded, you _can't_
have any RAID inconsistencies.  You may have some filesystem
inconsistencies (a fsck is definitely recommended) and/or data
inconsistencies (unless you have checksums or backups to compare against
then you're stuck on finding these though).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18  8:28               ` Robin Hill
@ 2009-09-18  9:57                 ` Majed B.
  2009-09-18 10:22                   ` Robin Hill
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-18  9:57 UTC (permalink / raw)
  To: linux-raid

Thank you for the insight, Robin.

I already have used dd_rescue to find which sectors are bad, so I
guess I could either wait for Matthias to finish his modifications to
mdadm, or I can reconstruct the bad sectors manually (read same sector
from other disks, xor all, write to damaged disk's clone).

Weird thing though, is that when I re-read some of the bad sectors, I
didn't get I/O errors ... it's confusing!

Also, I'd rather avoid a fsck when I have bad sectors to not lose
files. I'll run fsck once I've fixed the bad sectors and resynced the
array.

On Fri, Sep 18, 2009 at 11:28 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> All the 'check' action does is validate that the checksum matches the
> data.  By doing this, it will also be doing a full read check on the
> array (though I'm not certain what action is taken on read failures).
> The 'repair' action will also rewrite any checksums which don't match
> the data.
>
> All of this requires a non-degraded array, so I suspect the 'check' and
> 'repair' actions will get ignored altogether on a degraded array (and
> certainly won't actually work).  As the array is degraded, you _can't_
> have any RAID inconsistencies.  You may have some filesystem
> inconsistencies (a fsck is definitely recommended) and/or data
> inconsistencies (unless you have checksums or backups to compare against
> then you're stuck on finding these though).
>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18  9:57                 ` Majed B.
@ 2009-09-18 10:22                   ` Robin Hill
  2009-09-18 10:52                     ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Robin Hill @ 2009-09-18 10:22 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On Fri Sep 18, 2009 at 12:57:23PM +0300, Majed B. wrote:

> Thank you for the insight, Robin.
> 
> I already have used dd_rescue to find which sectors are bad, so I
> guess I could either wait for Matthias to finish his modifications to
> mdadm, or I can reconstruct the bad sectors manually (read same sector
> from other disks, xor all, write to damaged disk's clone).
> 
This won't work if your array is degraded though - you don't have enough
data to do the reconstruction (unless you have two failed drives you can
partially read?).

> Weird thing though, is that when I re-read some of the bad sectors, I
> didn't get I/O errors ... it's confusing!
> 
Odd.  I'd recommend using ddrescue rather than dd_rescue - it's faster
and handles retries of bad sectors better.

> Also, I'd rather avoid a fsck when I have bad sectors to not lose
> files. I'll run fsck once I've fixed the bad sectors and resynced the
> array.
> 
True - a fsck should only be done once the data's in the best possible
state,

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18 10:22                   ` Robin Hill
@ 2009-09-18 10:52                     ` Majed B.
  2009-09-18 11:15                       ` Robin Hill
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-18 10:52 UTC (permalink / raw)
  To: linux-raid

Well, I think my case is different Matthias's and I can't reconstruct
the data anymore, as you said, Robin.

So this leaves me with a degraded array with bad sectors and a dodgy filesystem.

You see, I can mount the LVM Logical Volume (formatted with XFS), but
as soon as I hit some bad sectors, XFS complains and then one of the
array disks jump out.
Just now, one disk exited the array and renamed itself from sdg to sdj
.... (this is the first time this happens). According to smartctl -a
/dev/sdj, there are no bad sectors, but I still get this in
/var/log/messages

Sep 18 07:01:38 Adam kernel: [316599.950147] sd 6:0:0:0: [sdg] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Sep 18 07:01:38 Adam kernel: [316599.950175] raid5:md0: read error not
correctable (sector 1240859816 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950223] raid5:md0: read error not
correctable (sector 1240859824 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950225] raid5:md0: read error not
correctable (sector 1240859832 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950227] raid5:md0: read error not
correctable (sector 1240859840 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950230] raid5:md0: read error not
correctable (sector 1240859848 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950232] raid5:md0: read error not
correctable (sector 1240859856 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950234] raid5:md0: read error not
correctable (sector 1240859864 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950236] raid5:md0: read error not
correctable (sector 1240859872 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950238] raid5:md0: read error not
correctable (sector 1240859880 on sdg1).
Sep 18 07:01:38 Adam kernel: [316599.950240] raid5:md0: read error not
correctable (sector 1240859888 on sdg1).

When the disk exits the array, it becomes useless (6 out of 8 disks)
and XFS complains:

Sep 18 07:01:46 Adam kernel: [316607.896293] xfs_imap_to_bp:
xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Sep 18 07:01:46 Adam kernel: [316607.896374] xfs_imap_to_bp:
xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Sep 18 07:01:46 Adam kernel: [316607.896453] xfs_imap_to_bp:
xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.

Here's some info on smartctl -a /dev/sdg
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age
Always       -       0
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       0

I can't find an explanation to why disks are behaving this way...

====================================================

Plan B: Since I cloned the disk with bad sectors to another, what
would happen if I zeroed the damaged one then cloned the clone to it?!

I do realize that there will be zeros in the areas of bad sectors, but
how will mdadm/md behave? Would a resync fail?

I can run fsck at that point and files residing on bad sectors will be
the only affected ones, correct?

On Fri, Sep 18, 2009 at 1:22 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Fri Sep 18, 2009 at 12:57:23PM +0300, Majed B. wrote:
>
>> Thank you for the insight, Robin.
>>
>> I already have used dd_rescue to find which sectors are bad, so I
>> guess I could either wait for Matthias to finish his modifications to
>> mdadm, or I can reconstruct the bad sectors manually (read same sector
>> from other disks, xor all, write to damaged disk's clone).
>>
> This won't work if your array is degraded though - you don't have enough
> data to do the reconstruction (unless you have two failed drives you can
> partially read?).
>
>> Weird thing though, is that when I re-read some of the bad sectors, I
>> didn't get I/O errors ... it's confusing!
>>
> Odd.  I'd recommend using ddrescue rather than dd_rescue - it's faster
> and handles retries of bad sectors better.
>
>> Also, I'd rather avoid a fsck when I have bad sectors to not lose
>> files. I'll run fsck once I've fixed the bad sectors and resynced the
>> array.
>>
> True - a fsck should only be done once the data's in the best possible
> state,
>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18 10:52                     ` Majed B.
@ 2009-09-18 11:15                       ` Robin Hill
  0 siblings, 0 replies; 33+ messages in thread
From: Robin Hill @ 2009-09-18 11:15 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3063 bytes --]

On Fri Sep 18, 2009 at 01:52:14PM +0300, Majed B. wrote:

> Well, I think my case is different Matthias's and I can't reconstruct
> the data anymore, as you said, Robin.
> 
> So this leaves me with a degraded array with bad sectors and a dodgy
> filesystem.
> 
> You see, I can mount the LVM Logical Volume (formatted with XFS), but
> as soon as I hit some bad sectors, XFS complains and then one of the
> array disks jump out.
> Just now, one disk exited the array and renamed itself from sdg to sdj
> .... (this is the first time this happens). According to smartctl -a
> /dev/sdj, there are no bad sectors, but I still get this in
> /var/log/messages
> 
The renaming would suggest a hard bus reset - not what I'd expect with
just a bad block.

> Here's some info on smartctl -a /dev/sdg
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x002e   100   253   000    Old_age
> Always       -       0
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
> 
A lot of these are only updated via offline tests, so won't change in
normal use, even if there are issues.  Have you run any SMART tests on
the disk?  The long test usually shows a failure if the disk has read
errors.

> Plan B: Since I cloned the disk with bad sectors to another, what
> would happen if I zeroed the damaged one then cloned the clone to it?!
> 
Depends on what the actual condition of the disk is.  The zeroing should
remap any bad blocks though.

> I do realize that there will be zeros in the areas of bad sectors, but
> how will mdadm/md behave? Would a resync fail?
> 
mdadm doesn't care what data is on it, as long as the array metadata is
valid.  Providing all disks are readable (and the new disk is writable)
then a resync would certainly work - whether the filesystem will be
usable afterwards depends on how many zeroed blocks there are and where
they fall.

> I can run fsck at that point and files residing on bad sectors will be
> the only affected ones, correct?
> 
Files/directories yes - if the directory inodes get zeroed then all the
files within the directory will be affected (renamed & moved to
/lost+found).

I've had to do just this myself recently, and despite the low number of
zeroed blocks, there was an awful lot of filesystem damage (I ended up
restoring most of it from backup).


    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18  8:17             ` Majed B.
  2009-09-18  8:28               ` Robin Hill
@ 2009-09-18 11:35               ` Matthias Urlichs
  2009-09-18 17:44                 ` John Robinson
  1 sibling, 1 reply; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-18 11:35 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Fri, 2009-09-18 at 11:17 +0300, Majed B. wrote:
> 
> I have a RAID5 array of 8 disks running degraded on 7. One of the 7
> has bad sectors and the one that is not in the array also had bad
> sectors.

If you run a check on a degraded array and the check runs into errors it
can't recover from, I assume that the disk will get kicked off and
you'll have a nonfunctional array instead.

Not something I'd do in your situation.

I'll try to finish my patch ASAP.

It should be possible to convince the code to read from the offline disk
when absolutely necessary, but no guarantee that I'll get that in right
away. (On second thought, this only matters for RAID6.)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18 11:35               ` Matthias Urlichs
@ 2009-09-18 17:44                 ` John Robinson
  2009-09-18 18:02                   ` Greg Freemyer
  0 siblings, 1 reply; 33+ messages in thread
From: John Robinson @ 2009-09-18 17:44 UTC (permalink / raw)
  To: Linux RAID

On 18/09/2009 12:35, Matthias Urlichs wrote:
[...]
> If you run a check on a degraded array and the check runs into errors it
> can't recover from, I assume that the disk will get kicked off and
> you'll have a nonfunctional array instead.

No, I don't think so - at least with RAID-1, md doesn't drop the array 
on errors on the one remaining functional disc, on the grounds that some 
data is better than none, but I don't know whether the array gets 
switched to read-only or what the situation is with other RAID levels.

Cheers,

John.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18 17:44                 ` John Robinson
@ 2009-09-18 18:02                   ` Greg Freemyer
  2009-09-18 20:13                     ` Majed B.
  0 siblings, 1 reply; 33+ messages in thread
From: Greg Freemyer @ 2009-09-18 18:02 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

All,

I keep forgetting to ask, but the subject of this thread makes me
wonder if you guys are familiar with the hdparm features of
"--make-bad-sector", "--read-sector", and "--write-sector".

I don't know if any of those can be used to force a sector to be
remapped, but I could see a user space process like:

identify corrupt sector
hdparm --make-bad-sector   (to get it as corrupt as linux knows how).
calculate correct value
write new value to sector the normal way (hopefully the drive will
remap the bad sector)

hdparm --read-sector will do a low level read of the sector, including
the sector header and checksum as I understand it.  I'm not sure all
that gets back to userspace.

hdparm --write-sector will force a sector to be rewritten.  I don't
believe it is meant to ever cause a sector remap.  Of course you never
know what a disk drive is going to do for any given command.

Mark Lord is of course the expert on all things hdparm.

Greg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18 18:02                   ` Greg Freemyer
@ 2009-09-18 20:13                     ` Majed B.
  2009-10-02 13:55                       ` Bill Davidsen
  0 siblings, 1 reply; 33+ messages in thread
From: Majed B. @ 2009-09-18 20:13 UTC (permalink / raw)
  To: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]

Greg,

You don't really need to use hdparm. You can use dd to overwrite the
bad sectors with zeros which forces the disk to remap the sector.

As for calculating the new data, a friend of mine wrote me a java
program that takes in any number of input files and XORs them, then
writes the output to a file.
The input files are the sectors' data from other disks.

I have attached the program in case any one is interested. Courtesy to
Eng. Hisham Farahat who wrote the program "sector xor, or sexor, as I
call it"

java -jar sexor.jar file1 file2 ... fileN

The output file will always be called "out" -- do not include it in
the input list.

On Fri, Sep 18, 2009 at 9:02 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
> All,
>
> I keep forgetting to ask, but the subject of this thread makes me
> wonder if you guys are familiar with the hdparm features of
> "--make-bad-sector", "--read-sector", and "--write-sector".
>
> I don't know if any of those can be used to force a sector to be
> remapped, but I could see a user space process like:
>
> identify corrupt sector
> hdparm --make-bad-sector   (to get it as corrupt as linux knows how).
> calculate correct value
> write new value to sector the normal way (hopefully the drive will
> remap the bad sector)
>
> hdparm --read-sector will do a low level read of the sector, including
> the sector header and checksum as I understand it.  I'm not sure all
> that gets back to userspace.
>
> hdparm --write-sector will force a sector to be rewritten.  I don't
> believe it is meant to ever cause a sector remap.  Of course you never
> know what a disk drive is going to do for any given command.
>
> Mark Lord is of course the expert on all things hdparm.
>
> Greg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.

[-- Attachment #2: sexor.jar --]
[-- Type: application/java-archive, Size: 4054 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re-map disk sectors in userspace when rewriting after read errors
  2009-09-18 20:13                     ` Majed B.
@ 2009-10-02 13:55                       ` Bill Davidsen
  0 siblings, 0 replies; 33+ messages in thread
From: Bill Davidsen @ 2009-10-02 13:55 UTC (permalink / raw)
  To: Majed B.; +Cc: Linux RAID

Majed B. wrote:
> Greg,
>
> You don't really need to use hdparm. You can use dd to overwrite the
> bad sectors with zeros which forces the disk to remap the sector.
>   

 From the description of the problem, I would expect the md code to have 
rewritten the sector, and the problem is that the failed write isn't 
detected or somehow the write doesn't cause a relocate. That's my 
reading of the previous discussion, disk firmware is crap.

Newegg.Com had TB drives on sale for about $65 or so, hard to justify 
the time to live with crap, not to mention that the same grotty firmware 
which isn't getting the bad block remapped may be return bad data 
without warning. That would bother me.

-- 
Bill Davidsen <davidsen@tmr.com>
  Unintended results are the well-earned reward for incompetence.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-10-02 13:55 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-15  6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs
2009-09-15  6:45 ` berk walker
2009-09-15  7:23   ` Matthias Urlichs
2009-09-15  7:13 ` Alex Butcher
2009-09-15  7:29   ` Matthias Urlichs
2009-09-15  7:37     ` Alex Butcher
2009-09-15 10:48       ` Matthias Urlichs
2009-09-16  9:41         ` Goswin von Brederlow
2009-09-16 13:13           ` Matthias Urlichs
2009-09-18  8:17             ` Majed B.
2009-09-18  8:28               ` Robin Hill
2009-09-18  9:57                 ` Majed B.
2009-09-18 10:22                   ` Robin Hill
2009-09-18 10:52                     ` Majed B.
2009-09-18 11:15                       ` Robin Hill
2009-09-18 11:35               ` Matthias Urlichs
2009-09-18 17:44                 ` John Robinson
2009-09-18 18:02                   ` Greg Freemyer
2009-09-18 20:13                     ` Majed B.
2009-10-02 13:55                       ` Bill Davidsen
2009-09-15 10:40 ` Majed B.
2009-09-15 10:52   ` Matthias Urlichs
2009-09-15 11:03     ` Majed B.
2009-09-15 17:02       ` Majed B.
2009-09-15 18:05         ` Matthias Urlichs
2009-09-15 18:14           ` Majed B.
2009-09-15 18:44             ` Matthias Urlichs
2009-09-16  9:31               ` Majed B.
2009-09-16  9:44                 ` Matthias Urlichs
2009-09-16  9:52                   ` Majed B.
2009-09-16 13:05                     ` Alex Butcher
2009-09-16 10:00                   ` Robin Hill
2009-09-16 10:07                     ` Majed B.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).