How to fix Current_Pending

All of lore.kernel.org
 help / color / mirror / Atom feed

* How to fix Current_Pending_Sector?
@ 2010-03-11 11:51 Iain Rauch
  2010-03-11 12:06 ` Michael Evans
  0 siblings, 1 reply; 9+ messages in thread
From: Iain Rauch @ 2010-03-11 11:51 UTC (permalink / raw)
  To: LinuxRaid

Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
This actually happened for two disks now.

I ran a check and then a repair on my array and they both gave mismatch_cnt
of 8.

I ran a long self-test on both and they completed without error with no
errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
disk also has a 'UDMA_CRC_Error_Count' of 1.

I ran 'hdrecover' on both and they are both telling me "Couldn't recover
sector 2930277168". It's asking if I want to overwrite it with zeros to fix
it, but I would assume this will damage my array?

The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
for the array components. Does that sector fall outside my partition, and
hence would it be safe to overwrite it with zeros?

Also, why did I have a mismatch_cnt? I haven't run another check since I did
the repair, as I wanted to fix the pending sector.

BTW, I have a 15 drive RAID6.

Hope y'all can help.

Iain

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-11 11:51 How to fix Current_Pending_Sector? Iain Rauch
@ 2010-03-11 12:06 ` Michael Evans
  2010-03-11 12:25   ` Iain Rauch
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Evans @ 2010-03-11 12:06 UTC (permalink / raw)
  To: Iain Rauch; +Cc: LinuxRaid

On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch
<groups@email.iain.rauch.co.uk> wrote:
> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
> This actually happened for two disks now.
>
> I ran a check and then a repair on my array and they both gave mismatch_cnt
> of 8.
>
> I ran a long self-test on both and they completed without error with no
> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
> disk also has a 'UDMA_CRC_Error_Count' of 1.
>
> I ran 'hdrecover' on both and they are both telling me "Couldn't recover
> sector 2930277168". It's asking if I want to overwrite it with zeros to fix
> it, but I would assume this will damage my array?
>
> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
> for the array components. Does that sector fall outside my partition, and
> hence would it be safe to overwrite it with zeros?
>
> Also, why did I have a mismatch_cnt? I haven't run another check since I did
> the repair, as I wanted to fix the pending sector.
>
> BTW, I have a 15 drive RAID6.
>
> Hope y'all can help.
>
>
> Iain
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

If you are running RAID6 and it can read from all but two drives then
it should still be able to calculate whatever would match the
remaining (presumed good) reads to fill the later two drives.  RECENT
kernels will try to write over failed sectors automatically; and only
kick the drive if the write fails.

Please provide more information.

Kernel version
mdadm version

Information about how the source block devices are split up before
mdadm sees them, and any related messages from the system-log.  The
relevant section should be near the end of a dmesg output when you've
just completed a check or repair.  Your syslog probably already
captured the same data and stored it elsewhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-11 12:06 ` Michael Evans
@ 2010-03-11 12:25   ` Iain Rauch
  2010-03-11 16:54     ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 9+ messages in thread
From: Iain Rauch @ 2010-03-11 12:25 UTC (permalink / raw)
  To: Michael Evans; +Cc: LinuxRaid

> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch
> <groups@email.iain.rauch.co.uk> wrote:
>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
>> This actually happened for two disks now.
>> 
>> I ran a check and then a repair on my array and they both gave mismatch_cnt
>> of 8.
>> 
>> I ran a long self-test on both and they completed without error with no
>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
>> disk also has a 'UDMA_CRC_Error_Count' of 1.
>> 
>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover
>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix
>> it, but I would assume this will damage my array?
>> 
>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
>> for the array components. Does that sector fall outside my partition, and
>> hence would it be safe to overwrite it with zeros?
>> 
>> Also, why did I have a mismatch_cnt? I haven't run another check since I did
>> the repair, as I wanted to fix the pending sector.
>> 
>> BTW, I have a 15 drive RAID6.
>> 
> 
> If you are running RAID6 and it can read from all but two drives then
> it should still be able to calculate whatever would match the
> remaining (presumed good) reads to fill the later two drives.  RECENT
> kernels will try to write over failed sectors automatically; and only
> kick the drive if the write fails.
> 
> Please provide more information.
> 
> Kernel version
> mdadm version
> 
> Information about how the source block devices are split up before
> mdadm sees them, and any related messages from the system-log.  The
> relevant section should be near the end of a dmesg output when you've
> just completed a check or repair.  Your syslog probably already
> captured the same data and stored it elsewhere.

I thought doing the repair was supposed to fix the issue, but it didn't seem
to touch it. I wonder if it is outside what md sees, but then how would it
have been noticed as unreadable? And is it coincidence that both drives have
the same unreadable sector?

root@Edna:/home/iain# uname -a
Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009
x86_64 GNU/Linux
root@Edna:/home/iain# mdadm -V
mdadm - v2.6.9 - 10th March 2009

I paste the end of messages below. There's loads of that all the way through
doing the repair so I'm not sure how to filter out the useful bits.


Iain


Mar 10 07:21:21 Edna -- MARK --
Mar 10 07:29:48 Edna kernel: [135073.510019] Modules linked in: appletalk
video output input_polldev nfsd auth_rpcgss exportfs nfs lockd nfs_acl
sunrpc xfs bonding lp ppdev psmouse pcspkr k8temp serio_raw i2c_piix4 r8168
snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc
parport shpchp ohci1394 ieee1394 sata_mv raid10 raid456 async_xor
async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font
bitblit softcursor
Mar 10 07:29:48 Edna kernel: [135073.510019] CPU 0:
Mar 10 07:29:48 Edna kernel: [135073.510019] Modules linked in: appletalk
video output input_polldev nfsd auth_rpcgss exportfs nfs lockd nfs_acl
sunrpc xfs bonding lp ppdev psmouse pcspkr k8temp serio_raw i2c_piix4 r8168
snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc
parport shpchp ohci1394 ieee1394 sata_mv raid10 raid456 async_xor
async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font
bitblit softcursor
Mar 10 07:29:48 Edna kernel: [135073.510019] Pid: 1005, comm: md1_raid5 Not
tainted 2.6.28-16-server #57-Ubuntu
Mar 10 07:29:48 Edna kernel: [135073.510019] RIP: 0010:[<ffffffffa007f7c9>]
[<ffffffffa007f7c9>] raid6_sse24_gen_syndrome+0x1e9/0x28a [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019] RSP: 0018:ffff88012bd0db58
EFLAGS: 00000297
Mar 10 07:29:48 Edna kernel: [135073.510019] RAX: ffff8800ac397000 RBX:
ffff88012bd0db90 RCX: ffff8800ac3978a0
Mar 10 07:29:48 Edna kernel: [135073.510019] RDX: ffff8800ac397880 RSI:
0000000000001000 RDI: 00000000ffffffff
Mar 10 07:29:48 Edna kernel: [135073.510019] RBP: ffff88012bd0db90 R08:
0000000000000880 R09: 00000000000008a0
Mar 10 07:29:48 Edna kernel: [135073.510019] R10: 00000000000008b0 R11:
0000000000000890 R12: ffff88012bd0db48
Mar 10 07:29:48 Edna kernel: [135073.510019] R13: ffff88012bd0db48 R14:
ffff88012bd0dae0 R15: ffff88012f214000
Mar 10 07:29:48 Edna kernel: [135073.510019] FS:  00007f05d81076f0(0000)
GS:ffffffff80a9b000(0000) knlGS:0000000000000000
Mar 10 07:29:48 Edna kernel: [135073.510019] CS:  0010 DS: 0018 ES: 0018
CR0: 0000000080050033
Mar 10 07:29:48 Edna kernel: [135073.510019] CR2: 00007fdd92599760 CR3:
0000000000201000 CR4: 00000000000006a0
Mar 10 07:29:48 Edna kernel: [135073.510019] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Mar 10 07:29:48 Edna kernel: [135073.510019] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Mar 10 07:29:48 Edna kernel: [135073.510019] Call Trace:
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffffa007f811>] ?
raid6_sse24_gen_syndrome+0x231/0x28a [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffffa0076d9a>]
compute_parity6+0x20a/0x380 [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffffa0078696>]
handle_parity_checks6+0x1d6/0x360 [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffffa007c507>]
handle_stripe6+0xb07/0xbd0 [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffffa007d395>]
handle_stripe+0x25/0x30 [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffffa007d9f7>]
raid5d+0x1f7/0x300 [raid456]
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff8056864c>]
md_thread+0x5c/0x140
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff80268a50>] ?
autoremove_wake_function+0x0/0x40
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff805685f0>] ?
md_thread+0x0/0x140
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff802685e9>]
kthread+0x49/0x90
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff80213979>]
child_rip+0xa/0x11
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff802685a0>] ?
kthread+0x0/0x90
Mar 10 07:29:48 Edna kernel: [135073.510019]  [<ffffffff8021396f>] ?
child_rip+0x0/0x11
Mar 10 07:33:03 Edna kernel: [135268.444637] md: md1: requested-resync done.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-11 12:25   ` Iain Rauch
@ 2010-03-11 16:54     ` Stefan /*St0fF*/ Hübner
  2010-03-15 11:20       ` Iain Rauch
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-03-11 16:54 UTC (permalink / raw)
  To: linux-raid

Am 11.03.2010 13:25, schrieb Iain Rauch:
>> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch
>> <groups@email.iain.rauch.co.uk> wrote:
>>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
>>> This actually happened for two disks now.
>>>
>>> I ran a check and then a repair on my array and they both gave mismatch_cnt
>>> of 8.
>>>
>>> I ran a long self-test on both and they completed without error with no
>>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
>>> disk also has a 'UDMA_CRC_Error_Count' of 1.
>>>
>>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover
>>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix
>>> it, but I would assume this will damage my array?
>>>
>>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
>>> for the array components. Does that sector fall outside my partition, and
>>> hence would it be safe to overwrite it with zeros?
>>>
>>> Also, why did I have a mismatch_cnt? I haven't run another check since I did
>>> the repair, as I wanted to fix the pending sector.
>>>
>>> BTW, I have a 15 drive RAID6.
>>>
>>
>> If you are running RAID6 and it can read from all but two drives then
>> it should still be able to calculate whatever would match the
>> remaining (presumed good) reads to fill the later two drives.  RECENT
>> kernels will try to write over failed sectors automatically; and only
>> kick the drive if the write fails.
>>
>> Please provide more information.
>>
>> Kernel version
>> mdadm version
>>
>> Information about how the source block devices are split up before
>> mdadm sees them, and any related messages from the system-log.  The
>> relevant section should be near the end of a dmesg output when you've
>> just completed a check or repair.  Your syslog probably already
>> captured the same data and stored it elsewhere.
> 
> I thought doing the repair was supposed to fix the issue, but it didn't seem
> to touch it. I wonder if it is outside what md sees, but then how would it
> have been noticed as unreadable? And is it coincidence that both drives have
> the same unreadable sector?
> 
> root@Edna:/home/iain# uname -a
> Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009
> x86_64 GNU/Linux
> root@Edna:/home/iain# mdadm -V
> mdadm - v2.6.9 - 10th March 2009
> 
> I paste the end of messages below. There's loads of that all the way through
> doing the repair so I'm not sure how to filter out the useful bits.
> 
> 
> Iain
> [...]

Hi Iain,

the "Current_pending_sectors" is a smart attribute which gets
incremented during online (reading and writing sectors) AND offline
drive scanning (also called SMART Data Collection), when the drive finds
out a sector cannot be correctly read at the first try (offline data
collection) or after applying various error-correction techniques.
The easiest way to get rid of this problem: dd a sector of zeros onto
the broken sector, then fail the drive, re-add it.  Now wait until the
resync is done.
The fact I'm not sure about is: should one fail and re-add both drives
at once?  As by that the redundancy would get lost...

Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4
drives need one redundancy" - so a redundancy of 2 with 15 drives is
kind of playing with your luck...

Good luck,
Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-11 16:54     ` Stefan /*St0fF*/ Hübner
@ 2010-03-15 11:20       ` Iain Rauch
  2010-03-18 17:35         ` CoolCold
  2010-03-18 19:37         ` David Rees
  0 siblings, 2 replies; 9+ messages in thread
From: Iain Rauch @ 2010-03-15 11:20 UTC (permalink / raw)
  To: st0ff, linux-raid

> Am 11.03.2010 13:25, schrieb Iain Rauch:
>>> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch
>>> <groups@email.iain.rauch.co.uk> wrote:
>>>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
>>>> This actually happened for two disks now.
>>>> 
>>>> I ran a check and then a repair on my array and they both gave mismatch_cnt
>>>> of 8.
>>>> 
>>>> I ran a long self-test on both and they completed without error with no
>>>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
>>>> disk also has a 'UDMA_CRC_Error_Count' of 1.
>>>> 
>>>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover
>>>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix
>>>> it, but I would assume this will damage my array?
>>>> 
>>>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
>>>> for the array components. Does that sector fall outside my partition, and
>>>> hence would it be safe to overwrite it with zeros?
>>>> 
>>>> Also, why did I have a mismatch_cnt? I haven't run another check since I
>>>> did
>>>> the repair, as I wanted to fix the pending sector.
>>>> 
>>>> BTW, I have a 15 drive RAID6.
>>>> 
>>> 
>>> If you are running RAID6 and it can read from all but two drives then
>>> it should still be able to calculate whatever would match the
>>> remaining (presumed good) reads to fill the later two drives.  RECENT
>>> kernels will try to write over failed sectors automatically; and only
>>> kick the drive if the write fails.
>>> 
>>> Please provide more information.
>>> 
>>> Kernel version
>>> mdadm version
>>> 
>>> Information about how the source block devices are split up before
>>> mdadm sees them, and any related messages from the system-log.  The
>>> relevant section should be near the end of a dmesg output when you've
>>> just completed a check or repair.  Your syslog probably already
>>> captured the same data and stored it elsewhere.
>> 
>> I thought doing the repair was supposed to fix the issue, but it didn't seem
>> to touch it. I wonder if it is outside what md sees, but then how would it
>> have been noticed as unreadable? And is it coincidence that both drives have
>> the same unreadable sector?
>> 
>> root@Edna:/home/iain# uname -a
>> Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009
>> x86_64 GNU/Linux
>> root@Edna:/home/iain# mdadm -V
>> mdadm - v2.6.9 - 10th March 2009
>> 
>> I paste the end of messages below. There's loads of that all the way through
>> doing the repair so I'm not sure how to filter out the useful bits.
>> 
>> 
>> Iain
>> [...]
> 
> Hi Iain,
> 
> the "Current_pending_sectors" is a smart attribute which gets
> incremented during online (reading and writing sectors) AND offline
> drive scanning (also called SMART Data Collection), when the drive finds
> out a sector cannot be correctly read at the first try (offline data
> collection) or after applying various error-correction techniques.
> The easiest way to get rid of this problem: dd a sector of zeros onto
> the broken sector, then fail the drive, re-add it.  Now wait until the
> resync is done.
> The fact I'm not sure about is: should one fail and re-add both drives
> at once?  As by that the redundancy would get lost...
> 
> Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4
> drives need one redundancy" - so a redundancy of 2 with 15 drives is
> kind of playing with your luck...
> 
> Good luck,
> Stefan

Well, I failed one of the drives and allowed 'hdrecover' to overwrite the
unreadable sector, but it still couldn't fix it. Here's its report:

Wiping sector 2930277168...
Checking sector is now readable...
I still couldn't read the sector!
I'm sorry, but even writing to the sector hasn't fixed it - there's nothing
more I can do!
Summary:
  1 bad sectors found
  of those 0 were recovered
  and 1 could not be recovered and were destroyed causing data loss

The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive.
I guess I could have just done part of it, but I suppose that verified the
whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the
Current_pending_sectors count back to zero. Still no error reports in the
SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have
gone up to one?

I re-partitoned and added it to the array and it rebuilt fine in ~12 hours.

Repeated the process with the second drive and everything's back to normal.

The drive that had the 'UDMA_CRC_Error_Count' still says 1, but I don't
think I need to worry about that?

In direct reply to Stefan:

I think you meant to dd zeros onto the drive /after/ failing it - would have
caused corruption otherwise?

I definitely think it made sense to do one at a time.

One parity drive for every four seems a bit extreme, especially when you
have a backup (which I don't). I'm fairly happy with 15 drives in RAID 6. I
had 24 drives before, and that did give me a few problems :p Just need to
keep the drives healthy. (Array scrubs, SMART tests etc).


Iain



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-15 11:20       ` Iain Rauch
@ 2010-03-18 17:35         ` CoolCold
  2010-03-18 19:37         ` David Rees
  1 sibling, 0 replies; 9+ messages in thread
From: CoolCold @ 2010-03-18 17:35 UTC (permalink / raw)
  To: Iain Rauch; +Cc: st0ff, linux-raid

I had similar issue - there were 5 Currently unreadable (pending)
sectors, 1 Offline uncorrectable sectors then drive was kicked out of
the raid, but readding drive helped - that bad sector gone. Now there
2 pending, 1 uncorrectable, so i gonna fix that two.
My question is - are there any ways to resync array faster? Say if
I'll update bitmaps from current 0.9, fail drive, do dd on sectors,
add drive, will bitmap help to resync not the whole drive, but just
parts which have changed?


On Mon, Mar 15, 2010 at 2:20 PM, Iain Rauch
<groups@email.iain.rauch.co.uk> wrote:
>> Am 11.03.2010 13:25, schrieb Iain Rauch:
>>>> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch
>>>> <groups@email.iain.rauch.co.uk> wrote:
>>>>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
>>>>> This actually happened for two disks now.
>>>>>
>>>>> I ran a check and then a repair on my array and they both gave mismatch_cnt
>>>>> of 8.
>>>>>
>>>>> I ran a long self-test on both and they completed without error with no
>>>>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
>>>>> disk also has a 'UDMA_CRC_Error_Count' of 1.
>>>>>
>>>>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover
>>>>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix
>>>>> it, but I would assume this will damage my array?
>>>>>
>>>>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
>>>>> for the array components. Does that sector fall outside my partition, and
>>>>> hence would it be safe to overwrite it with zeros?
>>>>>
>>>>> Also, why did I have a mismatch_cnt? I haven't run another check since I
>>>>> did
>>>>> the repair, as I wanted to fix the pending sector.
>>>>>
>>>>> BTW, I have a 15 drive RAID6.
>>>>>
>>>>
>>>> If you are running RAID6 and it can read from all but two drives then
>>>> it should still be able to calculate whatever would match the
>>>> remaining (presumed good) reads to fill the later two drives.  RECENT
>>>> kernels will try to write over failed sectors automatically; and only
>>>> kick the drive if the write fails.
>>>>
>>>> Please provide more information.
>>>>
>>>> Kernel version
>>>> mdadm version
>>>>
>>>> Information about how the source block devices are split up before
>>>> mdadm sees them, and any related messages from the system-log.  The
>>>> relevant section should be near the end of a dmesg output when you've
>>>> just completed a check or repair.  Your syslog probably already
>>>> captured the same data and stored it elsewhere.
>>>
>>> I thought doing the repair was supposed to fix the issue, but it didn't seem
>>> to touch it. I wonder if it is outside what md sees, but then how would it
>>> have been noticed as unreadable? And is it coincidence that both drives have
>>> the same unreadable sector?
>>>
>>> root@Edna:/home/iain# uname -a
>>> Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009
>>> x86_64 GNU/Linux
>>> root@Edna:/home/iain# mdadm -V
>>> mdadm - v2.6.9 - 10th March 2009
>>>
>>> I paste the end of messages below. There's loads of that all the way through
>>> doing the repair so I'm not sure how to filter out the useful bits.
>>>
>>>
>>> Iain
>>> [...]
>>
>> Hi Iain,
>>
>> the "Current_pending_sectors" is a smart attribute which gets
>> incremented during online (reading and writing sectors) AND offline
>> drive scanning (also called SMART Data Collection), when the drive finds
>> out a sector cannot be correctly read at the first try (offline data
>> collection) or after applying various error-correction techniques.
>> The easiest way to get rid of this problem: dd a sector of zeros onto
>> the broken sector, then fail the drive, re-add it.  Now wait until the
>> resync is done.
>> The fact I'm not sure about is: should one fail and re-add both drives
>> at once?  As by that the redundancy would get lost...
>>
>> Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4
>> drives need one redundancy" - so a redundancy of 2 with 15 drives is
>> kind of playing with your luck...
>>
>> Good luck,
>> Stefan
>
> Well, I failed one of the drives and allowed 'hdrecover' to overwrite the
> unreadable sector, but it still couldn't fix it. Here's its report:
>
> Wiping sector 2930277168...
> Checking sector is now readable...
> I still couldn't read the sector!
> I'm sorry, but even writing to the sector hasn't fixed it - there's nothing
> more I can do!
> Summary:
>  1 bad sectors found
>  of those 0 were recovered
>  and 1 could not be recovered and were destroyed causing data loss
>
> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive.
> I guess I could have just done part of it, but I suppose that verified the
> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the
> Current_pending_sectors count back to zero. Still no error reports in the
> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have
> gone up to one?
>
> I re-partitoned and added it to the array and it rebuilt fine in ~12 hours.
>
> Repeated the process with the second drive and everything's back to normal.
>
> The drive that had the 'UDMA_CRC_Error_Count' still says 1, but I don't
> think I need to worry about that?
>
> In direct reply to Stefan:
>
> I think you meant to dd zeros onto the drive /after/ failing it - would have
> caused corruption otherwise?
>
> I definitely think it made sense to do one at a time.
>
> One parity drive for every four seems a bit extreme, especially when you
> have a backup (which I don't). I'm fairly happy with 15 drives in RAID 6. I
> had 24 drives before, and that did give me a few problems :p Just need to
> keep the drives healthy. (Array scrubs, SMART tests etc).
>
>
> Iain
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-15 11:20       ` Iain Rauch
  2010-03-18 17:35         ` CoolCold
@ 2010-03-18 19:37         ` David Rees
  2010-03-18 21:47           ` Greg Freemyer
  1 sibling, 1 reply; 9+ messages in thread
From: David Rees @ 2010-03-18 19:37 UTC (permalink / raw)
  To: Iain Rauch; +Cc: st0ff, linux-raid

On Mon, Mar 15, 2010 at 4:20 AM, Iain Rauch
<groups@email.iain.rauch.co.uk> wrote:
> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive.
> I guess I could have just done part of it, but I suppose that verified the
> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the
> Current_pending_sectors count back to zero. Still no error reports in the
> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have
> gone up to one?

No - the drive was able to successfully write to the sector it was
unable to read from.  If the write had failed, it would have
reallocated the sector.

-Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-18 19:37         ` David Rees
@ 2010-03-18 21:47           ` Greg Freemyer
  2010-03-19  7:22             ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Freemyer @ 2010-03-18 21:47 UTC (permalink / raw)
  To: David Rees; +Cc: Iain Rauch, st0ff, linux-raid

On Thu, Mar 18, 2010 at 3:37 PM, David Rees <drees76@gmail.com> wrote:
> On Mon, Mar 15, 2010 at 4:20 AM, Iain Rauch
> <groups@email.iain.rauch.co.uk> wrote:
>> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive.
>> I guess I could have just done part of it, but I suppose that verified the
>> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the
>> Current_pending_sectors count back to zero. Still no error reports in the
>> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have
>> gone up to one?
>
> No - the drive was able to successfully write to the sector it was
> unable to read from.  If the write had failed, it would have
> reallocated the sector.
>
> -Dave

Dave,

Most sector writes are blind (ie. non-verified).

Is your theory that if the sector is marked as a Pending_Bad_Sector a
write is done, but it is verified, and a reallocate only occurs if the
verify fails?

I've never heard that theory, but it makes great sense.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to fix Current_Pending_Sector?
  2010-03-18 21:47           ` Greg Freemyer
@ 2010-03-19  7:22             ` Stefan /*St0fF*/ Hübner
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-03-19  7:22 UTC (permalink / raw)
  Cc: linux-raid

Hi Greg,

Am 18.03.2010 22:47, schrieb Greg Freemyer:
> On Thu, Mar 18, 2010 at 3:37 PM, David Rees <drees76@gmail.com> wrote:
>> On Mon, Mar 15, 2010 at 4:20 AM, Iain Rauch
>> <groups@email.iain.rauch.co.uk> wrote:
>>> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive.
>>> I guess I could have just done part of it, but I suppose that verified the
>>> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the
>>> Current_pending_sectors count back to zero. Still no error reports in the
>>> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have
>>> gone up to one?
>>
>> No - the drive was able to successfully write to the sector it was
>> unable to read from.  If the write had failed, it would have
>> reallocated the sector.
>>
>> -Dave
> 
> Dave,
> 
> Most sector writes are blind (ie. non-verified).

That is certainly right!

> 
> Is your theory that if the sector is marked as a Pending_Bad_Sector a
> write is done, but it is verified, and a reallocate only occurs if the
> verify fails?

If the drives has noted errorneous behaviour on a sector (i.e. marked it
pending), it will try to resolve the problem by verify.  It just only
makes sense that way, doesn't it?
> 
> I've never heard that theory, but it makes great sense.

IC, it does ;)
> 
> Greg

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-19  7:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-11 11:51 How to fix Current_Pending_Sector? Iain Rauch
2010-03-11 12:06 ` Michael Evans
2010-03-11 12:25   ` Iain Rauch
2010-03-11 16:54     ` Stefan /*St0fF*/ Hübner
2010-03-15 11:20       ` Iain Rauch
2010-03-18 17:35         ` CoolCold
2010-03-18 19:37         ` David Rees
2010-03-18 21:47           ` Greg Freemyer
2010-03-19  7:22             ` Stefan /*St0fF*/ Hübner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.