* How to fix Current_Pending_Sector? @ 2010-03-11 11:51 Iain Rauch 2010-03-11 12:06 ` Michael Evans 0 siblings, 1 reply; 9+ messages in thread From: Iain Rauch @ 2010-03-11 11:51 UTC (permalink / raw) To: LinuxRaid Smartd emailed me to say I have "1 Currently unreadable (pending) sectors". This actually happened for two disks now. I ran a check and then a repair on my array and they both gave mismatch_cnt of 8. I ran a long self-test on both and they completed without error with no errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one disk also has a 'UDMA_CRC_Error_Count' of 1. I ran 'hdrecover' on both and they are both telling me "Couldn't recover sector 2930277168". It's asking if I want to overwrite it with zeros to fix it, but I would assume this will damage my array? The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes for the array components. Does that sector fall outside my partition, and hence would it be safe to overwrite it with zeros? Also, why did I have a mismatch_cnt? I haven't run another check since I did the repair, as I wanted to fix the pending sector. BTW, I have a 15 drive RAID6. Hope y'all can help. Iain ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-11 11:51 How to fix Current_Pending_Sector? Iain Rauch @ 2010-03-11 12:06 ` Michael Evans 2010-03-11 12:25 ` Iain Rauch 0 siblings, 1 reply; 9+ messages in thread From: Michael Evans @ 2010-03-11 12:06 UTC (permalink / raw) To: Iain Rauch; +Cc: LinuxRaid On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch <groups@email.iain.rauch.co.uk> wrote: > Smartd emailed me to say I have "1 Currently unreadable (pending) sectors". > This actually happened for two disks now. > > I ran a check and then a repair on my array and they both gave mismatch_cnt > of 8. > > I ran a long self-test on both and they completed without error with no > errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one > disk also has a 'UDMA_CRC_Error_Count' of 1. > > I ran 'hdrecover' on both and they are both telling me "Couldn't recover > sector 2930277168". It's asking if I want to overwrite it with zeros to fix > it, but I would assume this will damage my array? > > The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes > for the array components. Does that sector fall outside my partition, and > hence would it be safe to overwrite it with zeros? > > Also, why did I have a mismatch_cnt? I haven't run another check since I did > the repair, as I wanted to fix the pending sector. > > BTW, I have a 15 drive RAID6. > > Hope y'all can help. > > > Iain > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > If you are running RAID6 and it can read from all but two drives then it should still be able to calculate whatever would match the remaining (presumed good) reads to fill the later two drives. RECENT kernels will try to write over failed sectors automatically; and only kick the drive if the write fails. Please provide more information. Kernel version mdadm version Information about how the source block devices are split up before mdadm sees them, and any related messages from the system-log. The relevant section should be near the end of a dmesg output when you've just completed a check or repair. Your syslog probably already captured the same data and stored it elsewhere. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-11 12:06 ` Michael Evans @ 2010-03-11 12:25 ` Iain Rauch 2010-03-11 16:54 ` Stefan /*St0fF*/ Hübner 0 siblings, 1 reply; 9+ messages in thread From: Iain Rauch @ 2010-03-11 12:25 UTC (permalink / raw) To: Michael Evans; +Cc: LinuxRaid > On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch > <groups@email.iain.rauch.co.uk> wrote: >> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors". >> This actually happened for two disks now. >> >> I ran a check and then a repair on my array and they both gave mismatch_cnt >> of 8. >> >> I ran a long self-test on both and they completed without error with no >> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one >> disk also has a 'UDMA_CRC_Error_Count' of 1. >> >> I ran 'hdrecover' on both and they are both telling me "Couldn't recover >> sector 2930277168". It's asking if I want to overwrite it with zeros to fix >> it, but I would assume this will damage my array? >> >> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes >> for the array components. Does that sector fall outside my partition, and >> hence would it be safe to overwrite it with zeros? >> >> Also, why did I have a mismatch_cnt? I haven't run another check since I did >> the repair, as I wanted to fix the pending sector. >> >> BTW, I have a 15 drive RAID6. >> > > If you are running RAID6 and it can read from all but two drives then > it should still be able to calculate whatever would match the > remaining (presumed good) reads to fill the later two drives. RECENT > kernels will try to write over failed sectors automatically; and only > kick the drive if the write fails. > > Please provide more information. > > Kernel version > mdadm version > > Information about how the source block devices are split up before > mdadm sees them, and any related messages from the system-log. The > relevant section should be near the end of a dmesg output when you've > just completed a check or repair. Your syslog probably already > captured the same data and stored it elsewhere. I thought doing the repair was supposed to fix the issue, but it didn't seem to touch it. I wonder if it is outside what md sees, but then how would it have been noticed as unreadable? And is it coincidence that both drives have the same unreadable sector? root@Edna:/home/iain# uname -a Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009 x86_64 GNU/Linux root@Edna:/home/iain# mdadm -V mdadm - v2.6.9 - 10th March 2009 I paste the end of messages below. There's loads of that all the way through doing the repair so I'm not sure how to filter out the useful bits. Iain Mar 10 07:21:21 Edna -- MARK -- Mar 10 07:29:48 Edna kernel: [135073.510019] Modules linked in: appletalk video output input_polldev nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc xfs bonding lp ppdev psmouse pcspkr k8temp serio_raw i2c_piix4 r8168 snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport shpchp ohci1394 ieee1394 sata_mv raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor Mar 10 07:29:48 Edna kernel: [135073.510019] CPU 0: Mar 10 07:29:48 Edna kernel: [135073.510019] Modules linked in: appletalk video output input_polldev nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc xfs bonding lp ppdev psmouse pcspkr k8temp serio_raw i2c_piix4 r8168 snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport shpchp ohci1394 ieee1394 sata_mv raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor Mar 10 07:29:48 Edna kernel: [135073.510019] Pid: 1005, comm: md1_raid5 Not tainted 2.6.28-16-server #57-Ubuntu Mar 10 07:29:48 Edna kernel: [135073.510019] RIP: 0010:[<ffffffffa007f7c9>] [<ffffffffa007f7c9>] raid6_sse24_gen_syndrome+0x1e9/0x28a [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] RSP: 0018:ffff88012bd0db58 EFLAGS: 00000297 Mar 10 07:29:48 Edna kernel: [135073.510019] RAX: ffff8800ac397000 RBX: ffff88012bd0db90 RCX: ffff8800ac3978a0 Mar 10 07:29:48 Edna kernel: [135073.510019] RDX: ffff8800ac397880 RSI: 0000000000001000 RDI: 00000000ffffffff Mar 10 07:29:48 Edna kernel: [135073.510019] RBP: ffff88012bd0db90 R08: 0000000000000880 R09: 00000000000008a0 Mar 10 07:29:48 Edna kernel: [135073.510019] R10: 00000000000008b0 R11: 0000000000000890 R12: ffff88012bd0db48 Mar 10 07:29:48 Edna kernel: [135073.510019] R13: ffff88012bd0db48 R14: ffff88012bd0dae0 R15: ffff88012f214000 Mar 10 07:29:48 Edna kernel: [135073.510019] FS: 00007f05d81076f0(0000) GS:ffffffff80a9b000(0000) knlGS:0000000000000000 Mar 10 07:29:48 Edna kernel: [135073.510019] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033 Mar 10 07:29:48 Edna kernel: [135073.510019] CR2: 00007fdd92599760 CR3: 0000000000201000 CR4: 00000000000006a0 Mar 10 07:29:48 Edna kernel: [135073.510019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 10 07:29:48 Edna kernel: [135073.510019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 10 07:29:48 Edna kernel: [135073.510019] Call Trace: Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffffa007f811>] ? raid6_sse24_gen_syndrome+0x231/0x28a [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffffa0076d9a>] compute_parity6+0x20a/0x380 [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffffa0078696>] handle_parity_checks6+0x1d6/0x360 [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffffa007c507>] handle_stripe6+0xb07/0xbd0 [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffffa007d395>] handle_stripe+0x25/0x30 [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffffa007d9f7>] raid5d+0x1f7/0x300 [raid456] Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff8056864c>] md_thread+0x5c/0x140 Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff80268a50>] ? autoremove_wake_function+0x0/0x40 Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff805685f0>] ? md_thread+0x0/0x140 Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff802685e9>] kthread+0x49/0x90 Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff80213979>] child_rip+0xa/0x11 Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff802685a0>] ? kthread+0x0/0x90 Mar 10 07:29:48 Edna kernel: [135073.510019] [<ffffffff8021396f>] ? child_rip+0x0/0x11 Mar 10 07:33:03 Edna kernel: [135268.444637] md: md1: requested-resync done. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-11 12:25 ` Iain Rauch @ 2010-03-11 16:54 ` Stefan /*St0fF*/ Hübner 2010-03-15 11:20 ` Iain Rauch 0 siblings, 1 reply; 9+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2010-03-11 16:54 UTC (permalink / raw) To: linux-raid Am 11.03.2010 13:25, schrieb Iain Rauch: >> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch >> <groups@email.iain.rauch.co.uk> wrote: >>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors". >>> This actually happened for two disks now. >>> >>> I ran a check and then a repair on my array and they both gave mismatch_cnt >>> of 8. >>> >>> I ran a long self-test on both and they completed without error with no >>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one >>> disk also has a 'UDMA_CRC_Error_Count' of 1. >>> >>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover >>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix >>> it, but I would assume this will damage my array? >>> >>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes >>> for the array components. Does that sector fall outside my partition, and >>> hence would it be safe to overwrite it with zeros? >>> >>> Also, why did I have a mismatch_cnt? I haven't run another check since I did >>> the repair, as I wanted to fix the pending sector. >>> >>> BTW, I have a 15 drive RAID6. >>> >> >> If you are running RAID6 and it can read from all but two drives then >> it should still be able to calculate whatever would match the >> remaining (presumed good) reads to fill the later two drives. RECENT >> kernels will try to write over failed sectors automatically; and only >> kick the drive if the write fails. >> >> Please provide more information. >> >> Kernel version >> mdadm version >> >> Information about how the source block devices are split up before >> mdadm sees them, and any related messages from the system-log. The >> relevant section should be near the end of a dmesg output when you've >> just completed a check or repair. Your syslog probably already >> captured the same data and stored it elsewhere. > > I thought doing the repair was supposed to fix the issue, but it didn't seem > to touch it. I wonder if it is outside what md sees, but then how would it > have been noticed as unreadable? And is it coincidence that both drives have > the same unreadable sector? > > root@Edna:/home/iain# uname -a > Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009 > x86_64 GNU/Linux > root@Edna:/home/iain# mdadm -V > mdadm - v2.6.9 - 10th March 2009 > > I paste the end of messages below. There's loads of that all the way through > doing the repair so I'm not sure how to filter out the useful bits. > > > Iain > [...] Hi Iain, the "Current_pending_sectors" is a smart attribute which gets incremented during online (reading and writing sectors) AND offline drive scanning (also called SMART Data Collection), when the drive finds out a sector cannot be correctly read at the first try (offline data collection) or after applying various error-correction techniques. The easiest way to get rid of this problem: dd a sector of zeros onto the broken sector, then fail the drive, re-add it. Now wait until the resync is done. The fact I'm not sure about is: should one fail and re-add both drives at once? As by that the redundancy would get lost... Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4 drives need one redundancy" - so a redundancy of 2 with 15 drives is kind of playing with your luck... Good luck, Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-11 16:54 ` Stefan /*St0fF*/ Hübner @ 2010-03-15 11:20 ` Iain Rauch 2010-03-18 17:35 ` CoolCold 2010-03-18 19:37 ` David Rees 0 siblings, 2 replies; 9+ messages in thread From: Iain Rauch @ 2010-03-15 11:20 UTC (permalink / raw) To: st0ff, linux-raid > Am 11.03.2010 13:25, schrieb Iain Rauch: >>> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch >>> <groups@email.iain.rauch.co.uk> wrote: >>>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors". >>>> This actually happened for two disks now. >>>> >>>> I ran a check and then a repair on my array and they both gave mismatch_cnt >>>> of 8. >>>> >>>> I ran a long self-test on both and they completed without error with no >>>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one >>>> disk also has a 'UDMA_CRC_Error_Count' of 1. >>>> >>>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover >>>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix >>>> it, but I would assume this will damage my array? >>>> >>>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes >>>> for the array components. Does that sector fall outside my partition, and >>>> hence would it be safe to overwrite it with zeros? >>>> >>>> Also, why did I have a mismatch_cnt? I haven't run another check since I >>>> did >>>> the repair, as I wanted to fix the pending sector. >>>> >>>> BTW, I have a 15 drive RAID6. >>>> >>> >>> If you are running RAID6 and it can read from all but two drives then >>> it should still be able to calculate whatever would match the >>> remaining (presumed good) reads to fill the later two drives. RECENT >>> kernels will try to write over failed sectors automatically; and only >>> kick the drive if the write fails. >>> >>> Please provide more information. >>> >>> Kernel version >>> mdadm version >>> >>> Information about how the source block devices are split up before >>> mdadm sees them, and any related messages from the system-log. The >>> relevant section should be near the end of a dmesg output when you've >>> just completed a check or repair. Your syslog probably already >>> captured the same data and stored it elsewhere. >> >> I thought doing the repair was supposed to fix the issue, but it didn't seem >> to touch it. I wonder if it is outside what md sees, but then how would it >> have been noticed as unreadable? And is it coincidence that both drives have >> the same unreadable sector? >> >> root@Edna:/home/iain# uname -a >> Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009 >> x86_64 GNU/Linux >> root@Edna:/home/iain# mdadm -V >> mdadm - v2.6.9 - 10th March 2009 >> >> I paste the end of messages below. There's loads of that all the way through >> doing the repair so I'm not sure how to filter out the useful bits. >> >> >> Iain >> [...] > > Hi Iain, > > the "Current_pending_sectors" is a smart attribute which gets > incremented during online (reading and writing sectors) AND offline > drive scanning (also called SMART Data Collection), when the drive finds > out a sector cannot be correctly read at the first try (offline data > collection) or after applying various error-correction techniques. > The easiest way to get rid of this problem: dd a sector of zeros onto > the broken sector, then fail the drive, re-add it. Now wait until the > resync is done. > The fact I'm not sure about is: should one fail and re-add both drives > at once? As by that the redundancy would get lost... > > Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4 > drives need one redundancy" - so a redundancy of 2 with 15 drives is > kind of playing with your luck... > > Good luck, > Stefan Well, I failed one of the drives and allowed 'hdrecover' to overwrite the unreadable sector, but it still couldn't fix it. Here's its report: Wiping sector 2930277168... Checking sector is now readable... I still couldn't read the sector! I'm sorry, but even writing to the sector hasn't fixed it - there's nothing more I can do! Summary: 1 bad sectors found of those 0 were recovered and 1 could not be recovered and were destroyed causing data loss The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive. I guess I could have just done part of it, but I suppose that verified the whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the Current_pending_sectors count back to zero. Still no error reports in the SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have gone up to one? I re-partitoned and added it to the array and it rebuilt fine in ~12 hours. Repeated the process with the second drive and everything's back to normal. The drive that had the 'UDMA_CRC_Error_Count' still says 1, but I don't think I need to worry about that? In direct reply to Stefan: I think you meant to dd zeros onto the drive /after/ failing it - would have caused corruption otherwise? I definitely think it made sense to do one at a time. One parity drive for every four seems a bit extreme, especially when you have a backup (which I don't). I'm fairly happy with 15 drives in RAID 6. I had 24 drives before, and that did give me a few problems :p Just need to keep the drives healthy. (Array scrubs, SMART tests etc). Iain ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-15 11:20 ` Iain Rauch @ 2010-03-18 17:35 ` CoolCold 2010-03-18 19:37 ` David Rees 1 sibling, 0 replies; 9+ messages in thread From: CoolCold @ 2010-03-18 17:35 UTC (permalink / raw) To: Iain Rauch; +Cc: st0ff, linux-raid I had similar issue - there were 5 Currently unreadable (pending) sectors, 1 Offline uncorrectable sectors then drive was kicked out of the raid, but readding drive helped - that bad sector gone. Now there 2 pending, 1 uncorrectable, so i gonna fix that two. My question is - are there any ways to resync array faster? Say if I'll update bitmaps from current 0.9, fail drive, do dd on sectors, add drive, will bitmap help to resync not the whole drive, but just parts which have changed? On Mon, Mar 15, 2010 at 2:20 PM, Iain Rauch <groups@email.iain.rauch.co.uk> wrote: >> Am 11.03.2010 13:25, schrieb Iain Rauch: >>>> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch >>>> <groups@email.iain.rauch.co.uk> wrote: >>>>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors". >>>>> This actually happened for two disks now. >>>>> >>>>> I ran a check and then a repair on my array and they both gave mismatch_cnt >>>>> of 8. >>>>> >>>>> I ran a long self-test on both and they completed without error with no >>>>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one >>>>> disk also has a 'UDMA_CRC_Error_Count' of 1. >>>>> >>>>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover >>>>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix >>>>> it, but I would assume this will damage my array? >>>>> >>>>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes >>>>> for the array components. Does that sector fall outside my partition, and >>>>> hence would it be safe to overwrite it with zeros? >>>>> >>>>> Also, why did I have a mismatch_cnt? I haven't run another check since I >>>>> did >>>>> the repair, as I wanted to fix the pending sector. >>>>> >>>>> BTW, I have a 15 drive RAID6. >>>>> >>>> >>>> If you are running RAID6 and it can read from all but two drives then >>>> it should still be able to calculate whatever would match the >>>> remaining (presumed good) reads to fill the later two drives. RECENT >>>> kernels will try to write over failed sectors automatically; and only >>>> kick the drive if the write fails. >>>> >>>> Please provide more information. >>>> >>>> Kernel version >>>> mdadm version >>>> >>>> Information about how the source block devices are split up before >>>> mdadm sees them, and any related messages from the system-log. The >>>> relevant section should be near the end of a dmesg output when you've >>>> just completed a check or repair. Your syslog probably already >>>> captured the same data and stored it elsewhere. >>> >>> I thought doing the repair was supposed to fix the issue, but it didn't seem >>> to touch it. I wonder if it is outside what md sees, but then how would it >>> have been noticed as unreadable? And is it coincidence that both drives have >>> the same unreadable sector? >>> >>> root@Edna:/home/iain# uname -a >>> Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009 >>> x86_64 GNU/Linux >>> root@Edna:/home/iain# mdadm -V >>> mdadm - v2.6.9 - 10th March 2009 >>> >>> I paste the end of messages below. There's loads of that all the way through >>> doing the repair so I'm not sure how to filter out the useful bits. >>> >>> >>> Iain >>> [...] >> >> Hi Iain, >> >> the "Current_pending_sectors" is a smart attribute which gets >> incremented during online (reading and writing sectors) AND offline >> drive scanning (also called SMART Data Collection), when the drive finds >> out a sector cannot be correctly read at the first try (offline data >> collection) or after applying various error-correction techniques. >> The easiest way to get rid of this problem: dd a sector of zeros onto >> the broken sector, then fail the drive, re-add it. Now wait until the >> resync is done. >> The fact I'm not sure about is: should one fail and re-add both drives >> at once? As by that the redundancy would get lost... >> >> Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4 >> drives need one redundancy" - so a redundancy of 2 with 15 drives is >> kind of playing with your luck... >> >> Good luck, >> Stefan > > Well, I failed one of the drives and allowed 'hdrecover' to overwrite the > unreadable sector, but it still couldn't fix it. Here's its report: > > Wiping sector 2930277168... > Checking sector is now readable... > I still couldn't read the sector! > I'm sorry, but even writing to the sector hasn't fixed it - there's nothing > more I can do! > Summary: > 1 bad sectors found > of those 0 were recovered > and 1 could not be recovered and were destroyed causing data loss > > The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive. > I guess I could have just done part of it, but I suppose that verified the > whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the > Current_pending_sectors count back to zero. Still no error reports in the > SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have > gone up to one? > > I re-partitoned and added it to the array and it rebuilt fine in ~12 hours. > > Repeated the process with the second drive and everything's back to normal. > > The drive that had the 'UDMA_CRC_Error_Count' still says 1, but I don't > think I need to worry about that? > > In direct reply to Stefan: > > I think you meant to dd zeros onto the drive /after/ failing it - would have > caused corruption otherwise? > > I definitely think it made sense to do one at a time. > > One parity drive for every four seems a bit extreme, especially when you > have a backup (which I don't). I'm fairly happy with 15 drives in RAID 6. I > had 24 drives before, and that did give me a few problems :p Just need to > keep the drives healthy. (Array scrubs, SMART tests etc). > > > Iain > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best regards, [COOLCOLD-RIPN] -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-15 11:20 ` Iain Rauch 2010-03-18 17:35 ` CoolCold @ 2010-03-18 19:37 ` David Rees 2010-03-18 21:47 ` Greg Freemyer 1 sibling, 1 reply; 9+ messages in thread From: David Rees @ 2010-03-18 19:37 UTC (permalink / raw) To: Iain Rauch; +Cc: st0ff, linux-raid On Mon, Mar 15, 2010 at 4:20 AM, Iain Rauch <groups@email.iain.rauch.co.uk> wrote: > The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive. > I guess I could have just done part of it, but I suppose that verified the > whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the > Current_pending_sectors count back to zero. Still no error reports in the > SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have > gone up to one? No - the drive was able to successfully write to the sector it was unable to read from. If the write had failed, it would have reallocated the sector. -Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-18 19:37 ` David Rees @ 2010-03-18 21:47 ` Greg Freemyer 2010-03-19 7:22 ` Stefan /*St0fF*/ Hübner 0 siblings, 1 reply; 9+ messages in thread From: Greg Freemyer @ 2010-03-18 21:47 UTC (permalink / raw) To: David Rees; +Cc: Iain Rauch, st0ff, linux-raid On Thu, Mar 18, 2010 at 3:37 PM, David Rees <drees76@gmail.com> wrote: > On Mon, Mar 15, 2010 at 4:20 AM, Iain Rauch > <groups@email.iain.rauch.co.uk> wrote: >> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive. >> I guess I could have just done part of it, but I suppose that verified the >> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the >> Current_pending_sectors count back to zero. Still no error reports in the >> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have >> gone up to one? > > No - the drive was able to successfully write to the sector it was > unable to read from. If the write had failed, it would have > reallocated the sector. > > -Dave Dave, Most sector writes are blind (ie. non-verified). Is your theory that if the sector is marked as a Pending_Bad_Sector a write is done, but it is verified, and a reallocate only occurs if the verify fails? I've never heard that theory, but it makes great sense. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: How to fix Current_Pending_Sector? 2010-03-18 21:47 ` Greg Freemyer @ 2010-03-19 7:22 ` Stefan /*St0fF*/ Hübner 0 siblings, 0 replies; 9+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2010-03-19 7:22 UTC (permalink / raw) Cc: linux-raid Hi Greg, Am 18.03.2010 22:47, schrieb Greg Freemyer: > On Thu, Mar 18, 2010 at 3:37 PM, David Rees <drees76@gmail.com> wrote: >> On Mon, Mar 15, 2010 at 4:20 AM, Iain Rauch >> <groups@email.iain.rauch.co.uk> wrote: >>> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive. >>> I guess I could have just done part of it, but I suppose that verified the >>> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the >>> Current_pending_sectors count back to zero. Still no error reports in the >>> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have >>> gone up to one? >> >> No - the drive was able to successfully write to the sector it was >> unable to read from. If the write had failed, it would have >> reallocated the sector. >> >> -Dave > > Dave, > > Most sector writes are blind (ie. non-verified). That is certainly right! > > Is your theory that if the sector is marked as a Pending_Bad_Sector a > write is done, but it is verified, and a reallocate only occurs if the > verify fails? If the drives has noted errorneous behaviour on a sector (i.e. marked it pending), it will try to resolve the problem by verify. It just only makes sense that way, doesn't it? > > I've never heard that theory, but it makes great sense. IC, it does ;) > > Greg Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-03-19 7:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-11 11:51 How to fix Current_Pending_Sector? Iain Rauch 2010-03-11 12:06 ` Michael Evans 2010-03-11 12:25 ` Iain Rauch 2010-03-11 16:54 ` Stefan /*St0fF*/ Hübner 2010-03-15 11:20 ` Iain Rauch 2010-03-18 17:35 ` CoolCold 2010-03-18 19:37 ` David Rees 2010-03-18 21:47 ` Greg Freemyer 2010-03-19 7:22 ` Stefan /*St0fF*/ Hübner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.