Re: strange problem with raid6 read errors on active non-degraded array

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Pedro Teixeira <finas@aeiou.pt>
Cc: linux-raid@vger.kernel.org
Subject: Re: strange problem with raid6 read errors on active non-degraded array
Date: Wed, 2 Jul 2014 20:45:02 +1000	[thread overview]
Message-ID: <20140702204502.6b538fa8@notabene.brown> (raw)
In-Reply-To: <20140702103241.Horde.iempNvYRo99Ts9G5Op7ionA@webmail.aeiou.pt>

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

On Wed, 02 Jul 2014 10:32:41 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:

> - I'm having the following problem on a raid6 md volume consisting og  
> 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
> 
>   - every time I run a fsck.ext4 I will get the exact same errors (  
> ...short read ). Forcing a repair on the md0 volume shows no errors  
> and completes without problems. All disks are active and the volume is  
> not degraded, still I can't get rid of the short errors on those 16  
> blocks and when the filesystem is mounted the read errors will come up  
> from time to time as they are probably in use.
> 
> - If I try to read those blocks with DD  ( dd if=/dev/md0  of=test.txt  
> seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file  
> but the file doesn't appear to have nothing on it ( and the file  
> doesn't take the 1.8T on disk as the disk is much smaller )
> 
> - this started happening after having a three disk failure. I  
> recovered from that failure by recreating the array with the  
> non-failed 13 disks plus the last failed one ( events didn't differ  
> much ). I then readed the other disks. The failed disks are all  
> physically good, tested them with hdat2 and they don't have read/write  
> errors so I reused them. I don't know why they failed, maybe some  
> incompatibility with SSHD's and the LSI HBA controller..
> 
> root@nas3:/# dd if=/dev/md0  of=teste.txt seek=458227712 count=6 bs=4096
> 6+0 records in
> 6+0 records out
> 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
> root@nas3:/# ls -lah teste.txt
> -rw-r--r-- 1 root root 1.8T Jul  2 10:22 teste.txt
> root@nas3:/#
> 
> 
> 
> root@nas3:/# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]  
> sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
>        13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2  
> [16/16] [UUUUUUUUUUUUUUUU]
> 
> - When doing a fsck.ext4 of /dev/md0 it returns the following ( and I  
> can do it over and over again with the exact same errors) :
> 
> root@nas3:/# fsck.ext4 -f /dev/md0
> e2fsck 1.42.10 (18-May-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Error reading block 458227712 (Attempt to read block from filesystem  
> resulted in short read) while reading inode and block bitmaps.  Ignore  
> error<y>? yes


Can't possible happen!

(Do worry, I say that a lot - I'm usually wrong).

What sort of computer?  Particularly is it 32bit or 64bit?

Try using 'dd' to read a few meg at various offsets (1G, 2G, 4G, 6G, 8G, ....)
and find out if there is a pattern, where it can read and where it cannot.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2014-07-02 10:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02  9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
2014-07-02  9:52 ` Roman Mamedov
2014-07-02 10:07   ` Pedro Teixeira
2014-07-02 10:11     ` Roman Mamedov
2014-07-02 10:37       ` Pedro Teixeira
2014-07-02 11:03       ` Pedro Teixeira
2014-07-02 10:45 ` NeilBrown [this message]
2014-07-02 11:54   ` Pedro Teixeira
     [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
2014-07-02 14:14       ` Pedro Teixeira
2014-07-02 14:55         ` Lars Täuber
2014-07-02 16:35         ` Ethan Wilson
     [not found]           ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
2014-07-02 21:34             ` Ethan Wilson
2014-07-02 16:43     ` John Stoffel
     [not found]       ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-02 18:41         ` Pedro Teixeira
2014-07-02 19:01         ` John Stoffel
2014-07-03  2:40     ` NeilBrown
2014-07-03  8:29       ` Pedro Teixeira
2014-07-03 10:39       ` Pedro Teixeira
2014-07-03 21:06       ` Pedro Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140702204502.6b538fa8@notabene.brown \
    --to=neilb@suse.de \
    --cc=finas@aeiou.pt \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.