* Re-map disk sectors in userspace when rewriting after read errors
@ 2009-09-15 6:23 Matthias Urlichs
2009-09-15 6:45 ` berk walker
` (2 more replies)
0 siblings, 3 replies; 33+ messages in thread
From: Matthias Urlichs @ 2009-09-15 6:23 UTC (permalink / raw)
To: linux-raid
Hi,
my problem is that I have a bunch of crappy disks which seem unable to
reliably remap bad areas after a read error.
This obviously makes the read error rewrite feature of our beloved
RAID5/6 code somewhat less than useful.
What I would like to do is to re-map these sectors in userspace -- either
by browbeating the disk into it, or by using the Device Mapper. So I'd
need a way to tell a userspace daemon "this device+block is unreadable",
and wait until said daemon tells the RAID core to go ahead.
I can do the userspace side easily, but my time to dig through the RAID
code and implement that sort of channel in a maintainable way is somewhat
limited. (Plus, I need that code sooner rather than later.)
Would somebody be able to help out? There may be some money in it ...
--
Matthias Urlichs
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs @ 2009-09-15 6:45 ` berk walker 2009-09-15 7:23 ` Matthias Urlichs 2009-09-15 7:13 ` Alex Butcher 2009-09-15 10:40 ` Majed B. 2 siblings, 1 reply; 33+ messages in thread From: berk walker @ 2009-09-15 6:45 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid Matthias Urlichs wrote: > Hi, > > my problem is that I have a bunch of crappy disks which seem unable to > reliably remap bad areas after a read error. > > This obviously makes the read error rewrite feature of our beloved > RAID5/6 code somewhat less than useful. > > What I would like to do is to re-map these sectors in userspace -- either > by browbeating the disk into it, or by using the Device Mapper. So I'd > need a way to tell a userspace daemon "this device+block is unreadable", > and wait until said daemon tells the RAID core to go ahead. > > I can do the userspace side easily, but my time to dig through the RAID > code and implement that sort of channel in a maintainable way is somewhat > limited. (Plus, I need that code sooner rather than later.) > > Would somebody be able to help out? There may be some money in it ... > > I can not believe the question. What file system might this be? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 6:45 ` berk walker @ 2009-09-15 7:23 ` Matthias Urlichs 0 siblings, 0 replies; 33+ messages in thread From: Matthias Urlichs @ 2009-09-15 7:23 UTC (permalink / raw) To: linux-raid On Tue, 15 Sep 2009 02:45:29 -0400, berk walker wrote: > I can not believe the question. What file system might this be? Umm, what's your problem with my question? And why would it matter which file system I'm using? _My_ problem is that I have a bunch of disks which are not as reliable as I'd like. Yes I could go and buy a new heap of 1TB disks, but frankly I'd like to avoid that. These disks are "good enough" for the data that's on them. I'll replace one if it fails entirely -- assuming that I can rebuild the RAID6 array when I do that. However, since the rewrite-after- read code has caused bad sectors to accumulate on all of these disks, I can't even do that at the moment. (And, since there's no command which knows how to recover bad spots from the other RAID disks yet (I hope to be able to work on _that_ problem next week), I can't even use ddrescue to copy one almost-good disk to a new one.) -- Matthias Urlichs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs 2009-09-15 6:45 ` berk walker @ 2009-09-15 7:13 ` Alex Butcher 2009-09-15 7:29 ` Matthias Urlichs 2009-09-15 10:40 ` Majed B. 2 siblings, 1 reply; 33+ messages in thread From: Alex Butcher @ 2009-09-15 7:13 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid On Tue, 15 Sep 2009, Matthias Urlichs wrote: > my problem is that I have a bunch of crappy disks which seem unable to > reliably remap bad areas after a read error. IME, discs don't remap after read errors, only on writes. > This obviously makes the read error rewrite feature of our beloved > RAID5/6 code somewhat less than useful. Are you sure that refresh-writes triggered by read errors are expected behaviour of md's RAID5/6 mode? Only they weren't for RAID1 until somewhat recently (2.6.15, IIRC). Best Regards, Alex ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 7:13 ` Alex Butcher @ 2009-09-15 7:29 ` Matthias Urlichs 2009-09-15 7:37 ` Alex Butcher 0 siblings, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-15 7:29 UTC (permalink / raw) To: linux-raid On Tue, 15 Sep 2009 08:13:08 +0100, Alex Butcher wrote: > IME, discs don't remap after read errors, only on writes. Some may remap after recoverable read errors. However, the RAID code does (I assume - see below) rewrite the data -- which the disk happily acknowledges -- only to report the very same error next time that spot's being read. :-( > Are you sure that refresh-writes triggered by read errors are expected > behaviour of md's RAID5/6 mode? Not 100%, no -- but recovering the data but otherwise ignoring the error (other than increment the error counter) would be a level of foolishness I won't assume of the RAID code's authors. -- Matthias Urlichs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 7:29 ` Matthias Urlichs @ 2009-09-15 7:37 ` Alex Butcher 2009-09-15 10:48 ` Matthias Urlichs 0 siblings, 1 reply; 33+ messages in thread From: Alex Butcher @ 2009-09-15 7:37 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid On Tue, 15 Sep 2009, Matthias Urlichs wrote: > On Tue, 15 Sep 2009 08:13:08 +0100, Alex Butcher wrote: > >> IME, discs don't remap after read errors, only on writes. > > Some may remap after recoverable read errors. However, the RAID code > does (I assume - see below) rewrite the data -- which the disk happily > acknowledges -- only to report the very same error next time that spot's > being read. :-( Odd. If I hadn't observed something similar myself with a 40G Maxtor (badblocks -w fails, wipe with dd if=/dev/zero, badblocks -w succeeds, badblocks -w fails again), I wouldn't believe it. SMART seems to think that it's nowhere near the reallocated sector count threshold. The only conclusion I can come to is that the firmware is trash, or being way too forgiving of inconsistently-performing spinning media. Either way, it's not suitable for data I even care a little bit about. What does SMART say about reallocated and pending sectors on your disks? If the reallocated threshold has been crossed, this might be the failure mode, I guess. What make/model are they? >> Are you sure that refresh-writes triggered by read errors are expected >> behaviour of md's RAID5/6 mode? > > Not 100%, no -- but recovering the data but otherwise ignoring the error > (other than increment the error counter) would be a level of foolishness > I won't assume of the RAID code's authors. Well, the RAID code had been in the kernel and was being used in production systems for quite some time before 2.6.15 came along. It took a BSD user to point it out and a read through the kernel source for me to believe it... Cheers, Alex ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 7:37 ` Alex Butcher @ 2009-09-15 10:48 ` Matthias Urlichs 2009-09-16 9:41 ` Goswin von Brederlow 0 siblings, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-15 10:48 UTC (permalink / raw) To: linux-raid On Tue, 15 Sep 2009 08:37:52 +0100, Alex Butcher wrote: > Either way, it's not > suitable for data I even care a little bit about. Ordinarily I'd agree with you. In this case, however, the data is mostly read-only and on backup media. So I don't really care if the disks fall off the edge of a cliff; the data will survive. I can justify a moderate amount of time working on this, with the hardware I have. I can't really justify buying eight new disks. NB: Please don't dismiss this kind of setup out of hand. I know that disks are cheap enough these days that the typical professional user won't ever need to worry about not being able to replace hardware which behaves like this. However, many people happen to be in a different situation. :-/ -- Matthias Urlichs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 10:48 ` Matthias Urlichs @ 2009-09-16 9:41 ` Goswin von Brederlow 2009-09-16 13:13 ` Matthias Urlichs 0 siblings, 1 reply; 33+ messages in thread From: Goswin von Brederlow @ 2009-09-16 9:41 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid Matthias Urlichs <matthias@urlichs.de> writes: > On Tue, 15 Sep 2009 08:37:52 +0100, Alex Butcher wrote: > >> Either way, it's not >> suitable for data I even care a little bit about. > > Ordinarily I'd agree with you. In this case, however, the data is mostly > read-only and on backup media. So I don't really care if the disks fall > off the edge of a cliff; the data will survive. > > I can justify a moderate amount of time working on this, with the > hardware I have. I can't really justify buying eight new disks. > > NB: Please don't dismiss this kind of setup out of hand. I know that > disks are cheap enough these days that the typical professional user > won't ever need to worry about not being able to replace hardware which > behaves like this. However, many people happen to be in a different > situation. :-/ How about making it re-read repaired blocks so it catches when the disk didn't remap? I'm assuming the following happens: 1) disk read fails 2) raid rebuilds the block from parity 3) raid writes block to bad disk 4) disk writes data to the old block and fails to detect a write error that would trigger a rempapping 5) re-read of the data succeeds because the data is still in the drives disk cache 6) later read of the data fails because nothing was remapped So you would need to write some repair-check-daemon that remembers repaired blocks, waits for enough data to have passed through the drive to flush the disk cache and then retries the block again. And again and again till it stops giving errors. Alternatively write a re-map device-mapper target that reserves some space of the disk and remaps bad blocks itself. MfG Goswin ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 9:41 ` Goswin von Brederlow @ 2009-09-16 13:13 ` Matthias Urlichs 2009-09-18 8:17 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-16 13:13 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: linux-raid On Wed, 2009-09-16 at 11:41 +0200, Goswin von Brederlow wrote: > Alternatively write a re-map device-mapper target that reserves some > space of the disk and remaps bad blocks itself. > That'd require some place to store the mapping so that the whole thing still works after a reboot. Which should probably be on a different disk. I tend to want to move (part of) that problem to userspace; you may want to do more than a simple remapping of a few blocks when that happens (e.g. test-reading the surrounding area). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 13:13 ` Matthias Urlichs @ 2009-09-18 8:17 ` Majed B. 2009-09-18 8:28 ` Robin Hill 2009-09-18 11:35 ` Matthias Urlichs 0 siblings, 2 replies; 33+ messages in thread From: Majed B. @ 2009-09-18 8:17 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid I've re-read this thread and I was wondering if: echo check > /sys/block/$array/md/sync_action would help me (and possibly Matthias) in any way. I have a RAID5 array of 8 disks running degraded on 7. One of the 7 has bad sectors and the one that is not in the array also had bad sectors. I zeroed the one out of the array (with dd) and then cloned the one with bad sectors in the array to it using dd_rescue. Later, I reassembled the array using the cloned disk instead of the original. So now, I'm sure I still have inconsistencies, but would doing the action above force a correction? Also, would that work on a degraded array? Thank you. On Wed, Sep 16, 2009 at 4:13 PM, Matthias Urlichs <matthias@urlichs.de> wrote: > On Wed, 2009-09-16 at 11:41 +0200, Goswin von Brederlow wrote: >> Alternatively write a re-map device-mapper target that reserves some >> space of the disk and remaps bad blocks itself. >> > That'd require some place to store the mapping so that the whole thing > still works after a reboot. Which should probably be on a different > disk. > > I tend to want to move (part of) that problem to userspace; you may want > to do more than a simple remapping of a few blocks when that happens > (e.g. test-reading the surrounding area). > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 8:17 ` Majed B. @ 2009-09-18 8:28 ` Robin Hill 2009-09-18 9:57 ` Majed B. 2009-09-18 11:35 ` Matthias Urlichs 1 sibling, 1 reply; 33+ messages in thread From: Robin Hill @ 2009-09-18 8:28 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1749 bytes --] On Fri Sep 18, 2009 at 11:17:27AM +0300, Majed B. wrote: > I've re-read this thread and I was wondering if: echo check > > /sys/block/$array/md/sync_action would help me (and possibly Matthias) > in any way. > > I have a RAID5 array of 8 disks running degraded on 7. One of the 7 > has bad sectors and the one that is not in the array also had bad > sectors. > > I zeroed the one out of the array (with dd) and then cloned the one > with bad sectors in the array to it using dd_rescue. > > Later, I reassembled the array using the cloned disk instead of the original. > > So now, I'm sure I still have inconsistencies, but would doing the > action above force a correction? Also, would that work on a degraded > array? > All the 'check' action does is validate that the checksum matches the data. By doing this, it will also be doing a full read check on the array (though I'm not certain what action is taken on read failures). The 'repair' action will also rewrite any checksums which don't match the data. All of this requires a non-degraded array, so I suspect the 'check' and 'repair' actions will get ignored altogether on a degraded array (and certainly won't actually work). As the array is degraded, you _can't_ have any RAID inconsistencies. You may have some filesystem inconsistencies (a fsck is definitely recommended) and/or data inconsistencies (unless you have checksums or backups to compare against then you're stuck on finding these though). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 8:28 ` Robin Hill @ 2009-09-18 9:57 ` Majed B. 2009-09-18 10:22 ` Robin Hill 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-18 9:57 UTC (permalink / raw) To: linux-raid Thank you for the insight, Robin. I already have used dd_rescue to find which sectors are bad, so I guess I could either wait for Matthias to finish his modifications to mdadm, or I can reconstruct the bad sectors manually (read same sector from other disks, xor all, write to damaged disk's clone). Weird thing though, is that when I re-read some of the bad sectors, I didn't get I/O errors ... it's confusing! Also, I'd rather avoid a fsck when I have bad sectors to not lose files. I'll run fsck once I've fixed the bad sectors and resynced the array. On Fri, Sep 18, 2009 at 11:28 AM, Robin Hill <robin@robinhill.me.uk> wrote: > All the 'check' action does is validate that the checksum matches the > data. By doing this, it will also be doing a full read check on the > array (though I'm not certain what action is taken on read failures). > The 'repair' action will also rewrite any checksums which don't match > the data. > > All of this requires a non-degraded array, so I suspect the 'check' and > 'repair' actions will get ignored altogether on a degraded array (and > certainly won't actually work). As the array is degraded, you _can't_ > have any RAID inconsistencies. You may have some filesystem > inconsistencies (a fsck is definitely recommended) and/or data > inconsistencies (unless you have checksums or backups to compare against > then you're stuck on finding these though). > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 9:57 ` Majed B. @ 2009-09-18 10:22 ` Robin Hill 2009-09-18 10:52 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Robin Hill @ 2009-09-18 10:22 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1274 bytes --] On Fri Sep 18, 2009 at 12:57:23PM +0300, Majed B. wrote: > Thank you for the insight, Robin. > > I already have used dd_rescue to find which sectors are bad, so I > guess I could either wait for Matthias to finish his modifications to > mdadm, or I can reconstruct the bad sectors manually (read same sector > from other disks, xor all, write to damaged disk's clone). > This won't work if your array is degraded though - you don't have enough data to do the reconstruction (unless you have two failed drives you can partially read?). > Weird thing though, is that when I re-read some of the bad sectors, I > didn't get I/O errors ... it's confusing! > Odd. I'd recommend using ddrescue rather than dd_rescue - it's faster and handles retries of bad sectors better. > Also, I'd rather avoid a fsck when I have bad sectors to not lose > files. I'll run fsck once I've fixed the bad sectors and resynced the > array. > True - a fsck should only be done once the data's in the best possible state, Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 10:22 ` Robin Hill @ 2009-09-18 10:52 ` Majed B. 2009-09-18 11:15 ` Robin Hill 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-18 10:52 UTC (permalink / raw) To: linux-raid Well, I think my case is different Matthias's and I can't reconstruct the data anymore, as you said, Robin. So this leaves me with a degraded array with bad sectors and a dodgy filesystem. You see, I can mount the LVM Logical Volume (formatted with XFS), but as soon as I hit some bad sectors, XFS complains and then one of the array disks jump out. Just now, one disk exited the array and renamed itself from sdg to sdj .... (this is the first time this happens). According to smartctl -a /dev/sdj, there are no bad sectors, but I still get this in /var/log/messages Sep 18 07:01:38 Adam kernel: [316599.950147] sd 6:0:0:0: [sdg] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Sep 18 07:01:38 Adam kernel: [316599.950175] raid5:md0: read error not correctable (sector 1240859816 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950223] raid5:md0: read error not correctable (sector 1240859824 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950225] raid5:md0: read error not correctable (sector 1240859832 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950227] raid5:md0: read error not correctable (sector 1240859840 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950230] raid5:md0: read error not correctable (sector 1240859848 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950232] raid5:md0: read error not correctable (sector 1240859856 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950234] raid5:md0: read error not correctable (sector 1240859864 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950236] raid5:md0: read error not correctable (sector 1240859872 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950238] raid5:md0: read error not correctable (sector 1240859880 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950240] raid5:md0: read error not correctable (sector 1240859888 on sdg1). When the disk exits the array, it becomes useless (6 out of 8 disks) and XFS complains: Sep 18 07:01:46 Adam kernel: [316607.896293] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0. Returning error. Sep 18 07:01:46 Adam kernel: [316607.896374] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0. Returning error. Sep 18 07:01:46 Adam kernel: [316607.896453] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0. Returning error. Here's some info on smartctl -a /dev/sdg 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 I can't find an explanation to why disks are behaving this way... ==================================================== Plan B: Since I cloned the disk with bad sectors to another, what would happen if I zeroed the damaged one then cloned the clone to it?! I do realize that there will be zeros in the areas of bad sectors, but how will mdadm/md behave? Would a resync fail? I can run fsck at that point and files residing on bad sectors will be the only affected ones, correct? On Fri, Sep 18, 2009 at 1:22 PM, Robin Hill <robin@robinhill.me.uk> wrote: > On Fri Sep 18, 2009 at 12:57:23PM +0300, Majed B. wrote: > >> Thank you for the insight, Robin. >> >> I already have used dd_rescue to find which sectors are bad, so I >> guess I could either wait for Matthias to finish his modifications to >> mdadm, or I can reconstruct the bad sectors manually (read same sector >> from other disks, xor all, write to damaged disk's clone). >> > This won't work if your array is degraded though - you don't have enough > data to do the reconstruction (unless you have two failed drives you can > partially read?). > >> Weird thing though, is that when I re-read some of the bad sectors, I >> didn't get I/O errors ... it's confusing! >> > Odd. I'd recommend using ddrescue rather than dd_rescue - it's faster > and handles retries of bad sectors better. > >> Also, I'd rather avoid a fsck when I have bad sectors to not lose >> files. I'll run fsck once I've fixed the bad sectors and resynced the >> array. >> > True - a fsck should only be done once the data's in the best possible > state, > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 10:52 ` Majed B. @ 2009-09-18 11:15 ` Robin Hill 0 siblings, 0 replies; 33+ messages in thread From: Robin Hill @ 2009-09-18 11:15 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 3063 bytes --] On Fri Sep 18, 2009 at 01:52:14PM +0300, Majed B. wrote: > Well, I think my case is different Matthias's and I can't reconstruct > the data anymore, as you said, Robin. > > So this leaves me with a degraded array with bad sectors and a dodgy > filesystem. > > You see, I can mount the LVM Logical Volume (formatted with XFS), but > as soon as I hit some bad sectors, XFS complains and then one of the > array disks jump out. > Just now, one disk exited the array and renamed itself from sdg to sdj > .... (this is the first time this happens). According to smartctl -a > /dev/sdj, there are no bad sectors, but I still get this in > /var/log/messages > The renaming would suggest a hard bus reset - not what I'd expect with just a bad block. > Here's some info on smartctl -a /dev/sdg > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x002e 100 253 000 Old_age > Always - 0 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age > Always - 0 > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age > Offline - 0 > A lot of these are only updated via offline tests, so won't change in normal use, even if there are issues. Have you run any SMART tests on the disk? The long test usually shows a failure if the disk has read errors. > Plan B: Since I cloned the disk with bad sectors to another, what > would happen if I zeroed the damaged one then cloned the clone to it?! > Depends on what the actual condition of the disk is. The zeroing should remap any bad blocks though. > I do realize that there will be zeros in the areas of bad sectors, but > how will mdadm/md behave? Would a resync fail? > mdadm doesn't care what data is on it, as long as the array metadata is valid. Providing all disks are readable (and the new disk is writable) then a resync would certainly work - whether the filesystem will be usable afterwards depends on how many zeroed blocks there are and where they fall. > I can run fsck at that point and files residing on bad sectors will be > the only affected ones, correct? > Files/directories yes - if the directory inodes get zeroed then all the files within the directory will be affected (renamed & moved to /lost+found). I've had to do just this myself recently, and despite the low number of zeroed blocks, there was an awful lot of filesystem damage (I ended up restoring most of it from backup). Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 8:17 ` Majed B. 2009-09-18 8:28 ` Robin Hill @ 2009-09-18 11:35 ` Matthias Urlichs 2009-09-18 17:44 ` John Robinson 1 sibling, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-18 11:35 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Fri, 2009-09-18 at 11:17 +0300, Majed B. wrote: > > I have a RAID5 array of 8 disks running degraded on 7. One of the 7 > has bad sectors and the one that is not in the array also had bad > sectors. If you run a check on a degraded array and the check runs into errors it can't recover from, I assume that the disk will get kicked off and you'll have a nonfunctional array instead. Not something I'd do in your situation. I'll try to finish my patch ASAP. It should be possible to convince the code to read from the offline disk when absolutely necessary, but no guarantee that I'll get that in right away. (On second thought, this only matters for RAID6.) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 11:35 ` Matthias Urlichs @ 2009-09-18 17:44 ` John Robinson 2009-09-18 18:02 ` Greg Freemyer 0 siblings, 1 reply; 33+ messages in thread From: John Robinson @ 2009-09-18 17:44 UTC (permalink / raw) To: Linux RAID On 18/09/2009 12:35, Matthias Urlichs wrote: [...] > If you run a check on a degraded array and the check runs into errors it > can't recover from, I assume that the disk will get kicked off and > you'll have a nonfunctional array instead. No, I don't think so - at least with RAID-1, md doesn't drop the array on errors on the one remaining functional disc, on the grounds that some data is better than none, but I don't know whether the array gets switched to read-only or what the situation is with other RAID levels. Cheers, John. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 17:44 ` John Robinson @ 2009-09-18 18:02 ` Greg Freemyer 2009-09-18 20:13 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Greg Freemyer @ 2009-09-18 18:02 UTC (permalink / raw) To: John Robinson; +Cc: Linux RAID All, I keep forgetting to ask, but the subject of this thread makes me wonder if you guys are familiar with the hdparm features of "--make-bad-sector", "--read-sector", and "--write-sector". I don't know if any of those can be used to force a sector to be remapped, but I could see a user space process like: identify corrupt sector hdparm --make-bad-sector (to get it as corrupt as linux knows how). calculate correct value write new value to sector the normal way (hopefully the drive will remap the bad sector) hdparm --read-sector will do a low level read of the sector, including the sector header and checksum as I understand it. I'm not sure all that gets back to userspace. hdparm --write-sector will force a sector to be rewritten. I don't believe it is meant to ever cause a sector remap. Of course you never know what a disk drive is going to do for any given command. Mark Lord is of course the expert on all things hdparm. Greg ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 18:02 ` Greg Freemyer @ 2009-09-18 20:13 ` Majed B. 2009-10-02 13:55 ` Bill Davidsen 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-18 20:13 UTC (permalink / raw) To: Linux RAID [-- Attachment #1: Type: text/plain, Size: 2008 bytes --] Greg, You don't really need to use hdparm. You can use dd to overwrite the bad sectors with zeros which forces the disk to remap the sector. As for calculating the new data, a friend of mine wrote me a java program that takes in any number of input files and XORs them, then writes the output to a file. The input files are the sectors' data from other disks. I have attached the program in case any one is interested. Courtesy to Eng. Hisham Farahat who wrote the program "sector xor, or sexor, as I call it" java -jar sexor.jar file1 file2 ... fileN The output file will always be called "out" -- do not include it in the input list. On Fri, Sep 18, 2009 at 9:02 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote: > All, > > I keep forgetting to ask, but the subject of this thread makes me > wonder if you guys are familiar with the hdparm features of > "--make-bad-sector", "--read-sector", and "--write-sector". > > I don't know if any of those can be used to force a sector to be > remapped, but I could see a user space process like: > > identify corrupt sector > hdparm --make-bad-sector (to get it as corrupt as linux knows how). > calculate correct value > write new value to sector the normal way (hopefully the drive will > remap the bad sector) > > hdparm --read-sector will do a low level read of the sector, including > the sector header and checksum as I understand it. I'm not sure all > that gets back to userspace. > > hdparm --write-sector will force a sector to be rewritten. I don't > believe it is meant to ever cause a sector remap. Of course you never > know what a disk drive is going to do for any given command. > > Mark Lord is of course the expert on all things hdparm. > > Greg > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. [-- Attachment #2: sexor.jar --] [-- Type: application/java-archive, Size: 4054 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-18 20:13 ` Majed B. @ 2009-10-02 13:55 ` Bill Davidsen 0 siblings, 0 replies; 33+ messages in thread From: Bill Davidsen @ 2009-10-02 13:55 UTC (permalink / raw) To: Majed B.; +Cc: Linux RAID Majed B. wrote: > Greg, > > You don't really need to use hdparm. You can use dd to overwrite the > bad sectors with zeros which forces the disk to remap the sector. > From the description of the problem, I would expect the md code to have rewritten the sector, and the problem is that the failed write isn't detected or somehow the write doesn't cause a relocate. That's my reading of the previous discussion, disk firmware is crap. Newegg.Com had TB drives on sale for about $65 or so, hard to justify the time to live with crap, not to mention that the same grotty firmware which isn't getting the bad block remapped may be return bad data without warning. That would bother me. -- Bill Davidsen <davidsen@tmr.com> Unintended results are the well-earned reward for incompetence. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs 2009-09-15 6:45 ` berk walker 2009-09-15 7:13 ` Alex Butcher @ 2009-09-15 10:40 ` Majed B. 2009-09-15 10:52 ` Matthias Urlichs 2 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-15 10:40 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid Hello, I'm facing a similar problem now with 2 disks. The Current_Pending_Sectors and Offline_Uncorrectable are higher than 100, on a RAID5. SMART monitoring tools failed to report these after each test so now I'm battling through... I'm running the array degraded and yesterday while trying to copy the data to another array (5.5TB), one disk jumped out of the dodgy array and caused I/O errors... It won't even resync to another disk beyond 15.6%. Currently, I'm cloning with dd_rescue and hoping to be able to copy most of the data, and accept some data loss... Would anyone suggest a better solution? P.S.: The disks in question are WD, model: WDC WD10EACS-00ZJB0. I have other WD disks and they're intact and have zero bad sectors... On Tue, Sep 15, 2009 at 9:23 AM, Matthias Urlichs <matthias@urlichs.de> wrote: > Hi, > > my problem is that I have a bunch of crappy disks which seem unable to > reliably remap bad areas after a read error. > > This obviously makes the read error rewrite feature of our beloved > RAID5/6 code somewhat less than useful. > > What I would like to do is to re-map these sectors in userspace -- either > by browbeating the disk into it, or by using the Device Mapper. So I'd > need a way to tell a userspace daemon "this device+block is unreadable", > and wait until said daemon tells the RAID core to go ahead. > > I can do the userspace side easily, but my time to dig through the RAID > code and implement that sort of channel in a maintainable way is somewhat > limited. (Plus, I need that code sooner rather than later.) > > Would somebody be able to help out? There may be some money in it ... > > -- > Matthias Urlichs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 10:40 ` Majed B. @ 2009-09-15 10:52 ` Matthias Urlichs 2009-09-15 11:03 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-15 10:52 UTC (permalink / raw) To: linux-raid On Tue, 15 Sep 2009 13:40:44 +0300, Majed B. wrote: > Would anyone suggest a better solution? You should tell ddrescue to log which sectors it failed to copy. You can then recover the missing data by reading the stuff at that offset from the other disks, and XORing the bytes. I plan to write a program which does that (and which also understands RAID1 and RAID6). How long can you survive without your data? -- Matthias Urlichs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 10:52 ` Matthias Urlichs @ 2009-09-15 11:03 ` Majed B. 2009-09-15 17:02 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-15 11:03 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid I've been trying to migrate for 2 weeks. I can wait another 2 ... maybe 3 weeks. Just to be clear, I'm using dd_rescue, not ddrescue (This is the GNU one). I read the log option but forgot to use it... now I've wasted over 20 hours... ugh ... /smacks self That would be a very useful program for cases like this! On Tue, Sep 15, 2009 at 1:52 PM, Matthias Urlichs <matthias@urlichs.de> wrote: > On Tue, 15 Sep 2009 13:40:44 +0300, Majed B. wrote: > >> Would anyone suggest a better solution? > > You should tell ddrescue to log which sectors it failed to copy. You can > then recover the missing data by reading the stuff at that offset from > the other disks, and XORing the bytes. > > I plan to write a program which does that (and which also understands > RAID1 and RAID6). How long can you survive without your data? > > -- > Matthias Urlichs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 11:03 ` Majed B. @ 2009-09-15 17:02 ` Majed B. 2009-09-15 18:05 ` Matthias Urlichs 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-15 17:02 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid Matthias, Out of curiosity, how will you find the sectors/blocks that reconstruct a certain bad sector? Is the data spread to the same block number on all disks? On Tue, Sep 15, 2009 at 2:03 PM, Majed B. <majedb@gmail.com> wrote: > I've been trying to migrate for 2 weeks. I can wait another 2 ... maybe 3 weeks. > > Just to be clear, I'm using dd_rescue, not ddrescue (This is the GNU > one). I read the log option but forgot to use it... now I've wasted > over 20 hours... ugh ... /smacks self > > That would be a very useful program for cases like this! > > On Tue, Sep 15, 2009 at 1:52 PM, Matthias Urlichs <matthias@urlichs.de> wrote: >> On Tue, 15 Sep 2009 13:40:44 +0300, Majed B. wrote: >> >>> Would anyone suggest a better solution? >> >> You should tell ddrescue to log which sectors it failed to copy. You can >> then recover the missing data by reading the stuff at that offset from >> the other disks, and XORing the bytes. >> >> I plan to write a program which does that (and which also understands >> RAID1 and RAID6). How long can you survive without your data? >> >> -- >> Matthias Urlichs >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Majed B. > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 17:02 ` Majed B. @ 2009-09-15 18:05 ` Matthias Urlichs 2009-09-15 18:14 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-15 18:05 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Tue, 2009-09-15 at 20:02 +0300, Majed B. wrote: > Out of curiosity, how will you find the sectors/blocks that > reconstruct a certain bad sector? Is the data spread to the same block > number on all disks? Yes. It's a byte-level operation, actually. The only part that's moderately tricky is, on RAID6, to determine which partition the Q drive is. Fortunately, mdadm already contains (almost) all the necessary logic. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 18:05 ` Matthias Urlichs @ 2009-09-15 18:14 ` Majed B. 2009-09-15 18:44 ` Matthias Urlichs 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-15 18:14 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid Hmm, so I guess I'm luckier since I run RAID5? (or not because I have 2 bad disks? :p) When do you expect to have a working application done, by the way? On Tue, Sep 15, 2009 at 9:05 PM, Matthias Urlichs <matthias@urlichs.de> wrote: > On Tue, 2009-09-15 at 20:02 +0300, Majed B. wrote: >> Out of curiosity, how will you find the sectors/blocks that >> reconstruct a certain bad sector? Is the data spread to the same block >> number on all disks? > > Yes. It's a byte-level operation, actually. > > The only part that's moderately tricky is, on RAID6, to determine which > partition the Q drive is. Fortunately, mdadm already contains (almost) > all the necessary logic. > > -- Majed B. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 18:14 ` Majed B. @ 2009-09-15 18:44 ` Matthias Urlichs 2009-09-16 9:31 ` Majed B. 0 siblings, 1 reply; 33+ messages in thread From: Matthias Urlichs @ 2009-09-15 18:44 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Tue, 2009-09-15 at 21:14 +0300, Majed B. wrote: > Hmm, so I guess I'm luckier since I run RAID5? (or not because I have > 2 bad disks? :p) > Well, depends on whether you have two errors in the same sector. If not, you're going to be lucky. > When do you expect to have a working application done, by the way? > Hopefully later this week. It'll probably be a patch to mdadm's development branch of some sort. Neil: In order to do that, I need to read badblock map files for some (or all) disks, in GNU dd_rescue's format preferably. Do you have a preference WRT how to tell mdadm about these? I tend towards "mdadm --recover 0:foo,2:bar DISK_DEVICE...". This would tell mdadm that the badblock map for disk 0 is in file 'foo', the map for disk 2 is in 'bar', and the other disks are supposed to be cleanly read/writeable. mdadm would then read RAID info from these devices, make sure it's consistent (or "mostly consistent" if using --force), read the bad block map, recover the data that's indicated to be bad and write it to the partitions in question, and zero out the blocks that are unrecoverable (and restore P+Q vectors for them). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-15 18:44 ` Matthias Urlichs @ 2009-09-16 9:31 ` Majed B. 2009-09-16 9:44 ` Matthias Urlichs 0 siblings, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-16 9:31 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid Matthias, I have a question which would probably sound stupid: If I have a bad blocks output file from dd_rescue, can I reconstruct a bad sector's data by reading the same sector from all disks (using dd if=/dev/sdx of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an normal XOR operation, write zeros to the bad block to force sector remap, then dd the XOR output to the said sector? On Tue, Sep 15, 2009 at 9:44 PM, Matthias Urlichs <matthias@urlichs.de> wrote: > On Tue, 2009-09-15 at 21:14 +0300, Majed B. wrote: >> Hmm, so I guess I'm luckier since I run RAID5? (or not because I have >> 2 bad disks? :p) >> > Well, depends on whether you have two errors in the same sector. If not, > you're going to be lucky. > >> When do you expect to have a working application done, by the way? >> > Hopefully later this week. It'll probably be a patch to mdadm's > development branch of some sort. > > Neil: In order to do that, I need to read badblock map files for some > (or all) disks, in GNU dd_rescue's format preferably. Do you have a > preference WRT how to tell mdadm about these? > > I tend towards "mdadm --recover 0:foo,2:bar DISK_DEVICE...". This would > tell mdadm that the badblock map for disk 0 is in file 'foo', the map > for disk 2 is in 'bar', and the other disks are supposed to be cleanly > read/writeable. > > mdadm would then read RAID info from these devices, make sure it's > consistent (or "mostly consistent" if using --force), read the bad block > map, recover the data that's indicated to be bad and write it to the > partitions in question, and zero out the blocks that are unrecoverable > (and restore P+Q vectors for them). > > > -- Majed B. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 9:31 ` Majed B. @ 2009-09-16 9:44 ` Matthias Urlichs 2009-09-16 9:52 ` Majed B. 2009-09-16 10:00 ` Robin Hill 0 siblings, 2 replies; 33+ messages in thread From: Matthias Urlichs @ 2009-09-16 9:44 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote: > I have a question which would probably sound stupid: If I have a bad > blocks output file from dd_rescue, can I reconstruct a bad sector's > data by reading the same sector from all disks (using dd if=/dev/sdx > of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an > normal XOR operation, write zeros to the bad block to force sector > remap, then dd the XOR output to the said sector? Well, of course. Assuming that the disk's sector remap works, which was my problem, and that we're talking about RAID5. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 9:44 ` Matthias Urlichs @ 2009-09-16 9:52 ` Majed B. 2009-09-16 13:05 ` Alex Butcher 2009-09-16 10:00 ` Robin Hill 1 sibling, 1 reply; 33+ messages in thread From: Majed B. @ 2009-09-16 9:52 UTC (permalink / raw) To: Matthias Urlichs; +Cc: linux-raid That's good, I guess, but I fell into what seems to be a problem yesterday. I've mentioned before that I have 8 disks in an array. 7 of which belong to it (degraded), and one doesn't. That outsider disk had bad sectors. I wrote zeros to the disk yesterday and both Pending and Offline counts have been reset, but Reallocation count didn't increase. I did run an immediate offline smartd test after zeroing the disk... Does that make sense?! On Wed, Sep 16, 2009 at 12:44 PM, Matthias Urlichs <matthias@urlichs.de> wrote: > On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote: >> I have a question which would probably sound stupid: If I have a bad >> blocks output file from dd_rescue, can I reconstruct a bad sector's >> data by reading the same sector from all disks (using dd if=/dev/sdx >> of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an >> normal XOR operation, write zeros to the bad block to force sector >> remap, then dd the XOR output to the said sector? > > Well, of course. Assuming that the disk's sector remap works, which was > my problem, and that we're talking about RAID5. > > > -- Majed B. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 9:52 ` Majed B. @ 2009-09-16 13:05 ` Alex Butcher 0 siblings, 0 replies; 33+ messages in thread From: Alex Butcher @ 2009-09-16 13:05 UTC (permalink / raw) To: Majed B.; +Cc: Matthias Urlichs, linux-raid On Wed, 16 Sep 2009, Majed B. wrote: > I wrote zeros to the disk yesterday and both Pending and Offline counts > have been reset, but Reallocation count didn't increase. Soft, rather than hard errors, presumably. These can occur if a drive is writing when power is unexpectedly removed. HTH, Alex ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 9:44 ` Matthias Urlichs 2009-09-16 9:52 ` Majed B. @ 2009-09-16 10:00 ` Robin Hill 2009-09-16 10:07 ` Majed B. 1 sibling, 1 reply; 33+ messages in thread From: Robin Hill @ 2009-09-16 10:00 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 997 bytes --] On Wed Sep 16, 2009 at 11:44:26AM +0200, Matthias Urlichs wrote: > On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote: > > I have a question which would probably sound stupid: If I have a bad > > blocks output file from dd_rescue, can I reconstruct a bad sector's > > data by reading the same sector from all disks (using dd if=/dev/sdx > > of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an > > normal XOR operation, write zeros to the bad block to force sector > > remap, then dd the XOR output to the said sector? > > Well, of course. Assuming that the disk's sector remap works, which was > my problem, and that we're talking about RAID5. > And also assuming that the array starts from the same sector of each disk. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re-map disk sectors in userspace when rewriting after read errors 2009-09-16 10:00 ` Robin Hill @ 2009-09-16 10:07 ` Majed B. 0 siblings, 0 replies; 33+ messages in thread From: Majed B. @ 2009-09-16 10:07 UTC (permalink / raw) To: linux-raid Thank you for the heads up, Robin. I've just checked and it seems that they do start from the same sector: /dev/sdg: WDC WD10EADS-65L5B1 /dev/sdh: MAXTOR STM31000340AS root@Adam:/boot# fdisk -l /dev/sd[g-h] Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdg1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdh1 1 121601 976760001 fd Linux raid autodetect There are other disks in the array, but the rest are all WD disks and have a similar structure to the one above. On Wed, Sep 16, 2009 at 1:00 PM, Robin Hill <robin@robinhill.me.uk> wrote: > On Wed Sep 16, 2009 at 11:44:26AM +0200, Matthias Urlichs wrote: > >> On Wed, 2009-09-16 at 12:31 +0300, Majed B. wrote: >> > I have a question which would probably sound stupid: If I have a bad >> > blocks output file from dd_rescue, can I reconstruct a bad sector's >> > data by reading the same sector from all disks (using dd if=/dev/sdx >> > of=./bbfix_#number bs=512 count=1 skip=bb_number-1), then run an >> > normal XOR operation, write zeros to the bad block to force sector >> > remap, then dd the XOR output to the said sector? >> >> Well, of course. Assuming that the disk's sector remap works, which was >> my problem, and that we're talking about RAID5. >> > And also assuming that the array starts from the same sector of each > disk. > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2009-10-02 13:55 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-15 6:23 Re-map disk sectors in userspace when rewriting after read errors Matthias Urlichs 2009-09-15 6:45 ` berk walker 2009-09-15 7:23 ` Matthias Urlichs 2009-09-15 7:13 ` Alex Butcher 2009-09-15 7:29 ` Matthias Urlichs 2009-09-15 7:37 ` Alex Butcher 2009-09-15 10:48 ` Matthias Urlichs 2009-09-16 9:41 ` Goswin von Brederlow 2009-09-16 13:13 ` Matthias Urlichs 2009-09-18 8:17 ` Majed B. 2009-09-18 8:28 ` Robin Hill 2009-09-18 9:57 ` Majed B. 2009-09-18 10:22 ` Robin Hill 2009-09-18 10:52 ` Majed B. 2009-09-18 11:15 ` Robin Hill 2009-09-18 11:35 ` Matthias Urlichs 2009-09-18 17:44 ` John Robinson 2009-09-18 18:02 ` Greg Freemyer 2009-09-18 20:13 ` Majed B. 2009-10-02 13:55 ` Bill Davidsen 2009-09-15 10:40 ` Majed B. 2009-09-15 10:52 ` Matthias Urlichs 2009-09-15 11:03 ` Majed B. 2009-09-15 17:02 ` Majed B. 2009-09-15 18:05 ` Matthias Urlichs 2009-09-15 18:14 ` Majed B. 2009-09-15 18:44 ` Matthias Urlichs 2009-09-16 9:31 ` Majed B. 2009-09-16 9:44 ` Matthias Urlichs 2009-09-16 9:52 ` Majed B. 2009-09-16 13:05 ` Alex Butcher 2009-09-16 10:00 ` Robin Hill 2009-09-16 10:07 ` Majed B.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).