Re: strange problem with raid6 read errors on active non-degraded array

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Pedro Teixeira <finas@aeiou.pt>
Cc: linux-raid@vger.kernel.org
Subject: Re: strange problem with raid6 read errors on active non-degraded array
Date: Thu, 3 Jul 2014 12:40:46 +1000	[thread overview]
Message-ID: <20140703124046.74272588@notabene.brown> (raw)
In-Reply-To: <20140702125434.Horde.abbwKfYRo99Ts-L6UvsCEIA@webmail.aeiou.pt>

[-- Attachment #1: Type: text/plain, Size: 3599 bytes --]

On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:

> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are  
> seagate sshd ST1000DX001.
> 
> So I run the "dd if=/dev/md0 of=/dev/null  bs=4096" and it failed on  
> alot of places. I had to restart the command several times with the  
> skip parameter set to a couple of blocks after the last block error.  
> It run for about 1.5TB of the total 13TB of the volume.
> The md volume didn't drop any drive when running this.
> 
> dmesg showed:
> 
> [ 1678.478156] Buffer I/O error on device md0, logical block 196012546

I love numbers, thanks.
The logical block size is 4096, or 8 sectors (1 sector is defined as 512
bytes), so this is at 
  196012546*8 == 1568100368 sectors into the array.

The array has a chunksize of 512K, or 1024 sectors so
 196012546*8/1024 = 1531348.015625

gives us the chunk number, and the remaining fraction of a chunk.

The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to
find where the above chunk is stored we divide by 14

   1531348/14 = 109382.0000

So that is chunk 109382 on the first device (though with rotating data,
it might not be the very first).

Add back in the factional part, multiple by 1024 sectors per chunk, and add
the Data Offset,

  109382.01562500*1024+262144 = 112269328

So it seems that sector 112269328 on some device is bad.

> The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>   
> raid.b" before and after running the "dd" command returned no changes:
> 

I didn't notice the fact that the bad block logs were not empty before, sorry.
Anyway:...
> 
> Bad-blocks on /dev/sdb:
>             112269328 for 512 sectors

Look at that - exactly the number I calculated.  I love it when that works
out.

So the problem is exactly that some blocks are thought by md to be bad.

Blocks get recorded as bad (for raid6) when:

 - a 'read' reported an error which could not be fixed, either
   because the array was degraded so the data could not be recovered,
   or because the attempt to write restored data failed
 - when recovering a spare, if the data to be written cannot be found (due to
   errors on other devices)
 - when a 'write' request to a device fails

When your array had three failed devices, some reads and writes would have
failed.  Maybe that caused the bad blocks to be recorded.
What sort of devices failures where they?  If the device became completely
inaccessible, then it would not have been possible to record the bad block
information.

Can you describe the sequence of events that lead to the three failures?
When you put the array back together, did you --create it, or --assemble
--force?

There isn't an easy way to remove the bad block list, as doing so is normally
asking for data corruption.
However it is probably justified in your case.
As it happens I included code in the kernel to make it possible to remove bad
blocks from the list - it was intended for testing only but I never removed
it.
If you run
  sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks | 
  while read; do
     echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
  done

then it should clear all of the bad blocks recorded  on sdq.
You should probably fail/remove the last two devices that you added to the
array before you do this, as they probably don't have properly uptodate
information and doing this will cause corruption.

I probably need to think about better ways to handle the bad block lists.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2014-07-03  2:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02  9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
2014-07-02  9:52 ` Roman Mamedov
2014-07-02 10:07   ` Pedro Teixeira
2014-07-02 10:11     ` Roman Mamedov
2014-07-02 10:37       ` Pedro Teixeira
2014-07-02 11:03       ` Pedro Teixeira
2014-07-02 10:45 ` NeilBrown
2014-07-02 11:54   ` Pedro Teixeira
     [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
2014-07-02 14:14       ` Pedro Teixeira
2014-07-02 14:55         ` Lars Täuber
2014-07-02 16:35         ` Ethan Wilson
     [not found]           ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
2014-07-02 21:34             ` Ethan Wilson
2014-07-02 16:43     ` John Stoffel
     [not found]       ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-02 18:41         ` Pedro Teixeira
2014-07-02 19:01         ` John Stoffel
2014-07-03  2:40     ` NeilBrown [this message]
2014-07-03  8:29       ` Pedro Teixeira
2014-07-03 10:39       ` Pedro Teixeira
2014-07-03 21:06       ` Pedro Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140703124046.74272588@notabene.brown \
    --to=neilb@suse.de \
    --cc=finas@aeiou.pt \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox