RE: Mapping physical disk block to logical block to selectively repair w/o forcing rescan

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
@ 2008-04-15 20:47 David Lethe
  2008-04-15 22:14 ` Janek Kozicki
  2008-04-16  0:35 ` Dan Williams
  0 siblings, 2 replies; 8+ messages in thread
From: David Lethe @ 2008-04-15 20:47 UTC (permalink / raw)
  To: linux-raid

I have some code that does some background media scanning that results in a list of physical disks and block numbers that are known bad. I want to repair the logical (md) block(s) that they correspond to (assuming not RAID0, of course) without a draconian full repair/rescan.  As such, I need a physical to logical mapping technique.  It doesn't appear that there is a built-in mechanism in mdadm or by echoing a command to the /proc/mdstat, or anything else to force the md subsystem to repair the (parity protected) logical block associated with the physical disk and block that is known bad.  

(raid5extend.c has a phys2log function that seems to take everything into consideration, and RAID1 is a non-issue,  but as long as I am going to do this, then might as well cover all the bases and make sure it is done right) 

Has anybody written a script or something, or is there a technique I have missed that will provide block-level mapping?  I don't want to muck with modifying the md driver ... a shell script would be fine as this would be a rare occurrence, and since the disk will take billions of clock cycles to remap a bad sector, then an inefficient mapping script will hardly be noticeable.   

Also, assuming I have to write such a script and now know that block X on /dev/mdY needs to be repaired, then is there any risk of data corruption if I simply issue a read  dd if=/dev/md$Y of=/dev/null count=1 skip=$X to force the md driver to repair the stripe? 
(Actually, count should be increased large enough to force a non-cached read, and I think block number might need to be examined to make sure that it is not associated with a block that contains parity information, otherwise, the md engine might not detect the problem and force parity rebuild).

Any advice, comments, code will be appreciated.  

David @ santools ^ com

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-15 20:47 Mapping physical disk block to logical block to selectively repair w/o forcing rescan David Lethe
@ 2008-04-15 22:14 ` Janek Kozicki
  2008-04-16  0:35 ` Dan Williams
  1 sibling, 0 replies; 8+ messages in thread
From: Janek Kozicki @ 2008-04-15 22:14 UTC (permalink / raw)
  To: linux-raid

David Lethe said:     (by the date of Tue, 15 Apr 2008 15:47:05 -0500)

> I have some code that does some background media scanning that
> results in a list of physical disks and block numbers that are known bad.

wow, that's a nice code. Can you share it?

-- 
Janek Kozicki                                                         |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-15 20:47 Mapping physical disk block to logical block to selectively repair w/o forcing rescan David Lethe
  2008-04-15 22:14 ` Janek Kozicki
@ 2008-04-16  0:35 ` Dan Williams
  2008-04-16  0:37   ` Dan Williams
  1 sibling, 1 reply; 8+ messages in thread
From: Dan Williams @ 2008-04-16  0:35 UTC (permalink / raw)
  To: David Lethe; +Cc: linux-raid

On Tue, Apr 15, 2008 at 1:47 PM, David Lethe <david@santools.com> wrote:
> I have some code that does some background media scanning that results in a list of physical disks and block numbers that are known bad. I want to repair the logical (md) block(s) that they correspond to (assuming not RAID0, of course) without a draconian full repair/rescan.  As such, I need a physical to logical mapping technique.  It doesn't appear that there is a built-in mechanism in mdadm or by echoing a command to the /proc/mdstat, or anything else to force the md subsystem to repair the (parity protected) logical block associated with the physical disk and block that is known bad.
>
>  (raid5extend.c has a phys2log function that seems to take everything into consideration, and RAID1 is a non-issue,  but as long as I am going to do this, then might as well cover all the bases and make sure it is done right)
>
>  Has anybody written a script or something, or is there a technique I have missed that will provide block-level mapping?  I don't want to muck with modifying the md driver ... a shell script would be fine as this would be a rare occurrence, and since the disk will take billions of clock cycles to remap a bad sector, then an inefficient mapping script will hardly be noticeable.
>
>  Also, assuming I have to write such a script and now know that block X on /dev/mdY needs to be repaired, then is there any risk of data corruption if I simply issue a read  dd if=/dev/md$Y of=/dev/null count=1 skip=$X to force the md driver to repair the stripe?
>  (Actually, count should be increased large enough to force a non-cached read, and I think block number might need to be examined to make sure that it is not associated with a block that contains parity information, otherwise, the md engine might not detect the problem and force parity rebuild).
>
>  Any advice, comments, code will be appreciated.
>

raid5_compute_sector() in drivers/md/raid5.c maps the logical RAID
block to the physical block.  Is that what you are looking for?

--
Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-16  0:35 ` Dan Williams
@ 2008-04-16  0:37   ` Dan Williams
  2008-04-16  2:47     ` David Lethe
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2008-04-16  0:37 UTC (permalink / raw)
  To: David Lethe; +Cc: linux-raid

>  >
>  >  Any advice, comments, code will be appreciated.
>  >
>
>  raid5_compute_sector() in drivers/md/raid5.c maps the logical RAID
>  block to the physical block.  Is that what you are looking for?
>

...and compute_blocknr() goes the other way.

--
Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-16  0:37   ` Dan Williams
@ 2008-04-16  2:47     ` David Lethe
  2008-04-16  6:04       ` Dan Williams
  2008-04-16 13:58       ` Bill Davidsen
  0 siblings, 2 replies; 8+ messages in thread
From: David Lethe @ 2008-04-16  2:47 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-raid

I have the physical disk sector/drive, so I will have to go backwards.
That means using compute_blocknr, factoring the chunk size, stripe size,
look at the raid5_private_data to get everything else, including whether
or not it is in a rebuild, what position the disk has in the stripe,
among
other things .. and repeat for RAID6.  Still all scriptable .. as long
as I keep the block calculations in 64-bits when on 32-bit kernel. 

I can parse mdadm -Q -D  to get health and configuration, or get it from
sysfs, haven't decided.

Now for recovery ... a change was made in 2.6.15 that affects how the
/dev/md recalculates & corrects the error, but I don't think I have to
worry about it. Just directly read the /dev/md block that corresponds to
the faulty physical disk/sector.  This should just repair the bad block
w/o enticing the md system to fail over the entire disk.  Exception
would be if the disk with bad block can remap due to a catastrophic
failure, or lack of spare sectors.  

Even if the bad physical block lands on a parity block in the /dev/md
space, it should get rebuilt because it has to read the entire stripe to
figure out if there is a parity error, which there will be because one
disk will return the sense data indicating an unrecoverable read error,
so the md will repair the stripe to keep parity consistent for me.

-----Original Message-----
From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
Behalf Of Dan Williams
Sent: Tuesday, April 15, 2008 7:37 PM
To: David Lethe
Cc: linux-raid@vger.kernel.org
Subject: Re: Mapping physical disk block to logical block to selectively
repair w/o forcing rescan

>  >
>  >  Any advice, comments, code will be appreciated.
>  >
>
>  raid5_compute_sector() in drivers/md/raid5.c maps the logical RAID
>  block to the physical block.  Is that what you are looking for?
>

...and compute_blocknr() goes the other way.

--
Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-16  2:47     ` David Lethe
@ 2008-04-16  6:04       ` Dan Williams
  2008-04-16 13:58       ` Bill Davidsen
  1 sibling, 0 replies; 8+ messages in thread
From: Dan Williams @ 2008-04-16  6:04 UTC (permalink / raw)
  To: David Lethe; +Cc: linux-raid

On Tue, Apr 15, 2008 at 7:47 PM, David Lethe <david@santools.com> wrote:
> I have the physical disk sector/drive, so I will have to go backwards.
>  That means using compute_blocknr, factoring the chunk size, stripe size,
>  look at the raid5_private_data to get everything else, including whether
>  or not it is in a rebuild, what position the disk has in the stripe,
>  among
>  other things .. and repeat for RAID6.  Still all scriptable .. as long
>  as I keep the block calculations in 64-bits when on 32-bit kernel.
>
>  I can parse mdadm -Q -D  to get health and configuration, or get it from
>  sysfs, haven't decided.
>
>  Now for recovery ... a change was made in 2.6.15 that affects how the
>  /dev/md recalculates & corrects the error, but I don't think I have to
>  worry about it. Just directly read the /dev/md block that corresponds to
>  the faulty physical disk/sector.  This should just repair the bad block
>  w/o enticing the md system to fail over the entire disk.  Exception
>  would be if the disk with bad block can remap due to a catastrophic
>  failure, or lack of spare sectors.
>
>  Even if the bad physical block lands on a parity block in the /dev/md
>  space, it should get rebuilt because it has to read the entire stripe to
>  figure out if there is a parity error, which there will be because one
>  disk will return the sense data indicating an unrecoverable read error,
>  so the md will repair the stripe to keep parity consistent for me.
>

There is no guarantee you can actually cause the bad block to be read
by doing a "dd if=/dev/mdN...".  The kernel will sometimes calculate a
disk without causing a read, although in  most cases it will directly
hit the disk.  For correcting parity disk bad blocks there is no way
to trigger a parity read without doing a resync operation or a write.
That said, it would not be too difficult to add an interface to tell
the kernel to try to read an entire stripe in order to trigger the bad
block recovery code.  Another aspect of the mechanism could be to have
the kernel not fail the disk and instead let userspace update a
badblocks(8) file to tell the filesystem to ignore that part of the
array...

--
Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-16  2:47     ` David Lethe
  2008-04-16  6:04       ` Dan Williams
@ 2008-04-16 13:58       ` Bill Davidsen
  2008-04-16 14:32         ` David Lethe
  1 sibling, 1 reply; 8+ messages in thread
From: Bill Davidsen @ 2008-04-16 13:58 UTC (permalink / raw)
  To: David Lethe; +Cc: Dan Williams, linux-raid

David Lethe wrote:
> I have the physical disk sector/drive, so I will have to go backwards.
> That means using compute_blocknr, factoring the chunk size, stripe size,
> look at the raid5_private_data to get everything else, including whether
> or not it is in a rebuild, what position the disk has in the stripe,
> among
> other things .. and repeat for RAID6.  Still all scriptable .. as long
> as I keep the block calculations in 64-bits when on 32-bit kernel. 
>
>   
Or use "bc" to do really long calculations. It works well with scripts.

> I can parse mdadm -Q -D  to get health and configuration, or get it from
> sysfs, haven't decided.
>
> Now for recovery ... a change was made in 2.6.15 that affects how the
> /dev/md recalculates & corrects the error, but I don't think I have to
> worry about it. Just directly read the /dev/md block that corresponds to
> the faulty physical disk/sector.  This should just repair the bad block
> w/o enticing the md system to fail over the entire disk.  Exception
> would be if the disk with bad block can remap due to a catastrophic
> failure, or lack of spare sectors.  
>
> Even if the bad physical block lands on a parity block in the /dev/md
> space, it should get rebuilt because it has to read the entire stripe to
> figure out if there is a parity error, which there will be because one
> disk will return the sense data indicating an unrecoverable read error,
> so the md will repair the stripe to keep parity consistent for me.
>
>   
The problem I see with this is that using raid1 you can read and entire 
array end to end and never use one mirror of the data. So unless you 
perform the 'check' operation you won't really be sure that you have the 
errors mapped. I suspect that running check fixes more errors than 
'repair' on most systems.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Mapping physical disk block to logical block to selectively repair w/o forcing rescan
  2008-04-16 13:58       ` Bill Davidsen
@ 2008-04-16 14:32         ` David Lethe
  0 siblings, 0 replies; 8+ messages in thread
From: David Lethe @ 2008-04-16 14:32 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Dan Williams, linux-raid

-----Original Message-----
From: Bill Davidsen [mailto:davidsen@tmr.com] 
Sent: Wednesday, April 16, 2008 8:59 AM
To: David Lethe
Cc: Dan Williams; linux-raid@vger.kernel.org
Subject: Re: Mapping physical disk block to logical block to selectively
repair w/o forcing rescan

David Lethe wrote:
> I have the physical disk sector/drive, so I will have to go backwards.
> That means using compute_blocknr, factoring the chunk size, stripe
size,
> look at the raid5_private_data to get everything else, including
whether
> or not it is in a rebuild, what position the disk has in the stripe,
> among
> other things .. and repeat for RAID6.  Still all scriptable .. as long
> as I keep the block calculations in 64-bits when on 32-bit kernel. 
>
>   
Or use "bc" to do really long calculations. It works well with scripts.

> I can parse mdadm -Q -D  to get health and configuration, or get it
from
> sysfs, haven't decided.
>
> Now for recovery ... a change was made in 2.6.15 that affects how the
> /dev/md recalculates & corrects the error, but I don't think I have to
> worry about it. Just directly read the /dev/md block that corresponds
to
> the faulty physical disk/sector.  This should just repair the bad
block
> w/o enticing the md system to fail over the entire disk.  Exception
> would be if the disk with bad block can remap due to a catastrophic
> failure, or lack of spare sectors.  
>
> Even if the bad physical block lands on a parity block in the /dev/md
> space, it should get rebuilt because it has to read the entire stripe
to
> figure out if there is a parity error, which there will be because one
> disk will return the sense data indicating an unrecoverable read
error,
> so the md will repair the stripe to keep parity consistent for me.
>
>   
The problem I see with this is that using raid1 you can read and entire 
array end to end and never use one mirror of the data. So unless you 
perform the 'check' operation you won't really be sure that you have the

errors mapped. I suspect that running check fixes more errors than 
'repair' on most systems.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 

===================================================================
So now it looks like I am back to using badblocks?   Will this force a
parity rebuild on RAID1?
I expect it would if I gave it a read/write or write flag, but I can't
do that because filesystems will be mounted.  It appears that 2.6.15
kernels and above at least have potential to do a parity rebuild on
reads, and prior kernels need a write to force a parity rebuild.

i.e,   badblocks -b 512 (KnownBadBlock-StripeSizeInBlocks)
StripeSizeInBlocks*3 /dev/mdN

(if known bad block is in first stripe then just start at block zero. If
I make total number of blocks 3 times the stripesize, and make it look
at the full stripes before and after the one that contains the parity
error, then is this a good strategy for 2.6.15 kernels an up?
Nice thing about badblocks is that it scans at starting location and it
accepts a range, so I just give it a large enough area to scan that will
catch errors on parity blocks and prevent me from having to worry about
the layout of the parity info on the stripe.  

Unfortunately I can't use badblocks with the read/write mode because
file system is mounted . Will the badblocks as used above force a
rebuild, or am I going to have to follow the badblocks with a fsck, or
fsck equivalent for whatever file system they used with the md driver?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-16 14:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-15 20:47 Mapping physical disk block to logical block to selectively repair w/o forcing rescan David Lethe
2008-04-15 22:14 ` Janek Kozicki
2008-04-16  0:35 ` Dan Williams
2008-04-16  0:37   ` Dan Williams
2008-04-16  2:47     ` David Lethe
2008-04-16  6:04       ` Dan Williams
2008-04-16 13:58       ` Bill Davidsen
2008-04-16 14:32         ` David Lethe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).