strange problem with raid6 read errors on active non-degraded array

Linux RAID subsystem development
 help / color / mirror / Atom feed

* strange problem with raid6 read errors on active non-degraded array
@ 2014-07-02  9:32 Pedro Teixeira
  2014-07-02  9:52 ` Roman Mamedov
  2014-07-02 10:45 ` NeilBrown
  0 siblings, 2 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02  9:32 UTC (permalink / raw)
  To: linux-raid

- I'm having the following problem on a raid6 md volume consisting og  
16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.

  - every time I run a fsck.ext4 I will get the exact same errors (  
...short read ). Forcing a repair on the md0 volume shows no errors  
and completes without problems. All disks are active and the volume is  
not degraded, still I can't get rid of the short errors on those 16  
blocks and when the filesystem is mounted the read errors will come up  
from time to time as they are probably in use.

- If I try to read those blocks with DD  ( dd if=/dev/md0  of=test.txt  
seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file  
but the file doesn't appear to have nothing on it ( and the file  
doesn't take the 1.8T on disk as the disk is much smaller )

- this started happening after having a three disk failure. I  
recovered from that failure by recreating the array with the  
non-failed 13 disks plus the last failed one ( events didn't differ  
much ). I then readed the other disks. The failed disks are all  
physically good, tested them with hdat2 and they don't have read/write  
errors so I reused them. I don't know why they failed, maybe some  
incompatibility with SSHD's and the LSI HBA controller..

root@nas3:/# dd if=/dev/md0  of=teste.txt seek=458227712 count=6 bs=4096
6+0 records in
6+0 records out
24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
root@nas3:/# ls -lah teste.txt
-rw-r--r-- 1 root root 1.8T Jul  2 10:22 teste.txt
root@nas3:/#



root@nas3:/# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]  
sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
       13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2  
[16/16] [UUUUUUUUUUUUUUUU]

- When doing a fsck.ext4 of /dev/md0 it returns the following ( and I  
can do it over and over again with the exact same errors) :

root@nas3:/# fsck.ext4 -f /dev/md0
e2fsck 1.42.10 (18-May-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error reading block 458227712 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227713 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227714 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227715 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227716 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227717 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227718 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227719 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227720 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227721 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227722 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227723 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227724 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227725 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227726 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227727 (Attempt to read block from filesystem  
resulted in short read) while reading inode and block bitmaps.  Ignore  
error<y>? yes
Force rewrite<y>? yes
Block bitmap differences:  +(458227712--458231839)  
+(458234642--458235681) +(458244447--458245519)  
+(458246454--458247229) +(458248461--458248750) +458250468 +458251108  
+(458261280--458261284) +(458263296--458263297) +458263312 +458263328  
+(458265376--458265379) +(458267392--458267394)  
+(458269440--458269441) +458269456 +(458269472--458269474)  
+(458271520--458271543) +(458273536--458273547)  
+(458275584--458275585) +458275600 +458275616 +(458277664--458277669)  
+(458279680--458279682) +(458281728--458281729)  
+(458283776--458284059) +458285824 +458285837 +458285840  
+(458285856--458285857) +(458287904--458287907)  
+(458289920--458289922) +458291968 +458291984 +458292000  
+(458294048--458294054) +(458296064--458296065) +458296080  
+(458296096--458296116) +(458298144--458298169)  
+(458300160--458300504) +(458302208--458302209) +458302224 +458302240  
+(458304288--458304298) +(458306304--458307950)  
+(458310400--458310401) +458310416 +458310432 +(458312480--458312483)  
+458314496 +458316544 +458316550 +458317824 +(458321152--458321950)  
+458321952 +458321954 +458321956 +458321958 +458321965 +458321981  
+(458323981--458323986) +(458327296--458327297)  
+(458328094--458328097) +(458331392--458331393)  
+(458333440--458333441) +(458335488--458335489)  
+(458337536--458337537) +458339584 +458339593 +458339595 +458339600  
+458339616 +458341616 +(458343680--458343681) +(458345728--458345729)  
+458347776 +458347792 +458347808 +458349808 +(458351872--458351874)  
+458351888 +458351904 +458353904 +(458355968--458355969)  
+(458356765--458356815) +(458359809--458360062) +458360064 +458360080  
+458360096 +(458360113--458360120) +458362096 +(458364160--458364161)  
+458364176 +458364192 +458366192 +(458368256--458368257) +458370304  
+458370307 +(458373115--458373116) +458373119 +(458373127--458373160)  
+(458375168--458379263) +(458379271--458379304)  
+(458381319--458381352) +(458383360--458432511)  
+(458433367--458433686) +(458434560--458514535)  
+(458516480--458516488) +(458516496--458561535)  
+(458561680--458565631) +(458565648--458574328)  
+(458574416--458575982) +(458576912--458577167)  
+(458577680--458579535) +(458579968--458582015)  
+(458594304--458594585) +(458594632--458595592)  
+(458595627--458595725) +(458595728--458596527)  
+(458596545--458596687) +(458597423--458598607)  
+(458598990--458602495) +(458602922--458603023)  
+(458604256--458604623) +(458605072--458605135)  
+(458605520--458605717) +(458605908--458608536)  
+(458608642--458609662) +(458609680--458610704)  
+(458610776--458613449) +(458613519--458615179)  
+(458616265--458616831) +(458617702--458618383)  
+(458618512--458619007) +(458619088--458619151)  
+(458619896--458621625) +(458621648--458622175)  
+(458622224--458622489) +(458622508--458622830)  
+(458622848--458623129) +(458623162--458623345)  
+(458623394--458623953) +(458623962--458624460)  
+(458624896--458624975) +(458624986--458626127)  
+(458626282--458627727) +(458627920--458629119)  
+(458629195--458632207) +(458632695--458632841)  
+(458633168--458633231) +(458633668--458633923)  
+(458634370--458634621) +(458634646--458634660)  
+(458634704--458635306) +(458635344--458636303)  
+(458636734--458637311) +(458638356--458639359)  
+(458639440--458640109) +(458640195--458645071)  
+(458645178--458645503) +(458645776--458645922)  
+(458646009--458646479) +(458646546--458647589)  
+(458647696--458648655) +(458649040--458649807)  
+(458650640--458651663) +(458652432--458653695)  
+(458657064--458657199) +(458657792--458658625)  
+(458658628--458658631) +(458658640--458659231)  
+(458659513--458659748) +(458659792--458659882)  
+(458660432--458661337) +(458661899--458663417)  
+(458663760--458664083) +(458665232--458665295)  
+(458665552--458665706) +(458665808--458668031)  
+(458668240--458668855) +(458669126--458669127)  
+(458669419--458670079) +(458674183--458674216) +458675464  
+(458676231--458676267) +(458676360--458676370)  
+(458676488--458676498) +458676616 +(458676744--458676754)  
+(458676872--458676873) +458677000 +458677128 +(458677256--458677257)  
+458677384 +458677512 +(458677640--458678410) +458678536 +458678664  
+458678666 +(458678792--458678794) +458678920 +(458679048--458679049)  
+458679306 +(458679688--458679770) +(458680327--458680360)  
+(458681736--458681781) +(458682375--458682408)  
+(458683784--458685154) +(458685192--458685193)  
+(458685832--458685882) +(458686471--458686507)  
+(458686600--458686604) +(458687112--458687115) +458687240 +458687368  
+(458687880--458688062) +(458688264--458688265)  
+(458688519--458688552) +(458689928--458690083)  
+(458690567--458690602) +458690978 +(458691976--458693464)  
+(458693510--458693514) +458693638 +(458693766--458693769) +458693894  
+(458694024--458694652) +(458694663--458694696)  
+(458696072--458705014) +458705160 +458705288 +(458705416--458705473)  
+(458706312--458706320) +(458706951--458706984)  
+(458708999--458709032) +(458711047--458711080)  
+(458713095--458713128) +(458715143--458715176)  
+(458717191--458717224) +(458719239--458719272) +458720616  
+(458721287--458721320) +(458721416--458721421) +458721544 +458722056  
+(458722184--458722187) +(458722696--458723254)  
+(458723335--458723368) +458723976 +(458724360--458724361)  
+(458725383--458725416) +(458725896--458725965)  
+(458727431--458727464) +(458727942--458728837)  
+(458729479--458729512) +(458731527--458731560)  
+(458733575--458733703) +(458734984--458739136)  
+(458739719--458739752) +(458741767--458741800)  
+(458743815--458743848) +(458745863--458745896)  
+(458747911--458747944) +(458749959--458749992) +(458751368--458751999)
Fix<y>? yes

/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 9057/427278336 files (7.2% non-contiguous),  
1157126209/3418209536 blocks


dmesg ( while doing the fsck.txt4 ) shows:


[84019.232630] Buffer I/O error on device md0, logical block 458227712
[84019.232715] Buffer I/O error on device md0, logical block 458227712
[84024.149583] Buffer I/O error on device md0, logical block 458227713
[84024.149679] Buffer I/O error on device md0, logical block 458227713
[84025.073526] Buffer I/O error on device md0, logical block 458227714
[84025.073617] Buffer I/O error on device md0, logical block 458227715
[84025.073688] Buffer I/O error on device md0, logical block 458227716
[84025.073765] Buffer I/O error on device md0, logical block 458227714
[84026.571139] Buffer I/O error on device md0, logical block 458227715
[84027.654387] Buffer I/O error on device md0, logical block 458227717
[84027.654474] Buffer I/O error on device md0, logical block 458227718
[84027.654549] Buffer I/O error on device md0, logical block 458227719
[84027.654617] Buffer I/O error on device md0, logical block 458227720
[84027.654684] Buffer I/O error on device md0, logical block 458227721
[84030.577188] quiet_error: 8 callbacks suppressed
[84030.577190] Buffer I/O error on device md0, logical block 458227720
[84031.233856] Buffer I/O error on device md0, logical block 458227721
[84031.907058] Buffer I/O error on device md0, logical block 458227722
[84032.534278] Buffer I/O error on device md0, logical block 458227723
[84033.186672] Buffer I/O error on device md0, logical block 458227724
[84033.847581] Buffer I/O error on device md0, logical block 458227725
[84034.453947] Buffer I/O error on device md0, logical block 458227726
[84035.073116] Buffer I/O error on device md0, logical block 458227727
[84068.605347] Buffer I/O error on device md0, logical block 458227712
[84068.605427] lost page write due to I/O error on md0
[84068.605439] Buffer I/O error on device md0, logical block 458227713
[84068.605519] lost page write due to I/O error on md0
[84068.605528] Buffer I/O error on device md0, logical block 458227714
[84068.605747] lost page write due to I/O error on md0
[84068.605757] Buffer I/O error on device md0, logical block 458227715
[84068.605828] lost page write due to I/O error on md0
[84068.605837] Buffer I/O error on device md0, logical block 458227716
[84068.605910] lost page write due to I/O error on md0
[84068.605919] Buffer I/O error on device md0, logical block 458227717
[84068.605995] lost page write due to I/O error on md0
[84068.606048] Buffer I/O error on device md0, logical block 458227718
[84068.606217] lost page write due to I/O error on md0
[84068.606227] Buffer I/O error on device md0, logical block 458227719
[84068.606295] lost page write due to I/O error on md0
[84068.606327] Buffer I/O error on device md0, logical block 458227720
[84068.606398] lost page write due to I/O error on md0
[84068.606407] Buffer I/O error on device md0, logical block 458227721
[84068.606471] lost page write due to I/O error on md0


Doing a resync brings no errors and finishes without problem:

[24406.670968] md: requested-resync of RAID array md0
[24406.670971] md: minimum _guaranteed_  speed: 1410065407 KB/sec/disk.
[24406.670973] md: using maximum available idle IO bandwidth (but not  
more than 1410065407 KB/sec) for requested-resync.
[24406.670981] md: using 128k window, over a total of 976631296k.
[33488.135225] md: md0: requested-resync done.


- doing:
root@nas3:/# debugfs /dev/md0
debugfs 1.42.10 (18-May-2014)
/dev/md0: Can't read a block bitmap while reading block bitmap
debugfs:

- brings the same kind of errors to dmesg.


- filesystem mounts and unmounts fine:

root@nas3:/# mount /dev/md0 /mnt
root@nas3:/# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             106G  5.3G   95G   6% /
tmpfs                 3.9G     0  3.9G   0% /lib/init/rw
udev                  3.9G  196K  3.9G   1% /dev
tmpfs                 3.9G     0  3.9G   0% /dev/shm
tmpfs                 3.9G     0  3.9G   0% /tmp
/dev/md0               13T  4.3T  8.5T  34% /mnt

[84215.958792] EXT4-fs (md0): mounted filesystem with ordered data  
mode. Opts: (null)

root@nas3:/# umount /mnt

mdadm --examine /dev/sd[bcdefghijklmnopqr] >> raid.status


/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : b56fa722:c5be1eda:5b3e89cc:7199d266

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : e8a1ec1f - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : e72b076e:42886d45:8978e63b:b70c3c1b

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : c3171f37 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : a195ff09:a794b5fc:7c830670:bcf450f1

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 208c8851 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 6ab2bcfc:872649a6:a053e0fe:94fe1fc3

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 1d8610fd - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdf:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : f4612be4:5e8b4db0:4e23f28d:e37d27b6

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 9112745e - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdg:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : e595d71c:c45d6fda:24a49338:2615328b

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 738c92c6 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 5
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdh:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 347fa638:4193adb2:4b8616d4:058fff18

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 90ea0da1 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 6
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdi:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 2f6ab7cb:3957ffa0:8b2decd2:b133cb5a

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 52ee087a - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 7
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdj:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : cd1cbc05:552bedbd:bf8f7be8:960afcd1

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 36a0c84e - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 8
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdk:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 4e352f48:398c4529:b39cd8c8:d5a14e7e

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 711be5ee - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 9
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdl:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 01e6c661:a4d8c466:84fd830c:dc3ec346

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : d452e0ec - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 10
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdm:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : aa22b86a:fb4effe6:8028a5ae:df01a2c2

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 7b7e81eb - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 11
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdn:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 8e0f1a50:50538cf7:c7553f75:22af1e8a

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : ff844db0 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 12
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdo:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : ea496b92:ac96fabc:23b5026a:30b0b80f

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 81a12bd0 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 13
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdp:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : 01173faa:f45adebc:9a1dc160:306641a2

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 229fdb9c - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 14
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)
/dev/sdq:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x8
      Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
            Name : nas3:Datastore  (local to host nas3)
   Creation Time : Tue May 27 12:18:06 2014
      Raid Level : raid6
    Raid Devices : 16

  Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
      Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
   Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=432 sectors
           State : clean
     Device UUID : a7a6c77f:88c5d5d7:c330ab03:6cf98a83

     Update Time : Wed Jul  2 10:03:48 2014
   Bad Block Log : 512 entries available at offset 72 sectors - bad  
blocks present.
        Checksum : 97537c43 - correct
          Events : 1128363

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 15
    Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'  
== replacing)












________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02  9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
@ 2014-07-02  9:52 ` Roman Mamedov
  2014-07-02 10:07   ` Pedro Teixeira
  2014-07-02 10:45 ` NeilBrown
  1 sibling, 1 reply; 19+ messages in thread
From: Roman Mamedov @ 2014-07-02  9:52 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

On Wed, 02 Jul 2014 10:32:41 +0100
Pedro Teixeira <finas@aeiou.pt> wrote:

> - I'm having the following problem on a raid6 md volume consisting og  
> 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
> 
>   - every time I run a fsck.ext4 I will get the exact same errors (  
> ...short read ). Forcing a repair on the md0 volume shows no errors  
> and completes without problems. All disks are active and the volume is  
> not degraded, still I can't get rid of the short errors on those 16  
> blocks and when the filesystem is mounted the read errors will come up  
> from time to time as they are probably in use.

Are you sure that Ext4 in your kernel, and all tools that you use with it (such
as the fsck) really support 16 TB filesystems? I recall there have been some
semi-obvious problems with that. Try a different FS, e.g. XFS or Btrfs instead
of Ext4.

> - If I try to read those blocks with DD  ( dd if=/dev/md0  of=test.txt  
> seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file  
> but the file doesn't appear to have nothing on it ( and the file  
> doesn't take the 1.8T on disk as the disk is much smaller )

> root@nas3:/# dd if=/dev/md0  of=teste.txt seek=458227712 count=6 bs=4096
> 6+0 records in
> 6+0 records out
> 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
> root@nas3:/# ls -lah teste.txt
> -rw-r--r-- 1 root root 1.8T Jul  2 10:22 teste.txt

Here you need to use skip=, not seek=. See "man dd".

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02  9:52 ` Roman Mamedov
@ 2014-07-02 10:07   ` Pedro Teixeira
  2014-07-02 10:11     ` Roman Mamedov
  0 siblings, 1 reply; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 10:07 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-raid

Hi Roman,

    Thanks for the reply and the correction on the "dd" command.

    - ext4 is in the kernel as the fs wouldn't mount otherwise and the  
tools are the latest ones ( e2fsprogs 1.42.10 )

    root@nas3:/# fsck.ext4 -V
    e2fsck 1.42.10 (18-May-2014)
    Using EXT2FS Library version 1.42.10, 18-May-2014

   - Doing the correct "dd" command ( dd if=/dev/md0  of=teste.txt  
skip=458227712 count=16 bs=4096 ) will net the same dmesg errors and a  
0 bytes file.

dd: reading `/dev/md0': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000268007 s, 0.0 kB/s

[88623.524481] Buffer I/O error on device md0, logical block 458227712

  - I'm sure this is not a filesystem problem, but something fishy  
with dm. As all disks are active and synced if one would have bad  
sectors dm should read the sector from another one, but aparently it  
is not doing that.

Cheers
Pedro


________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 10:07   ` Pedro Teixeira
@ 2014-07-02 10:11     ` Roman Mamedov
  2014-07-02 10:37       ` Pedro Teixeira
  2014-07-02 11:03       ` Pedro Teixeira
  0 siblings, 2 replies; 19+ messages in thread
From: Roman Mamedov @ 2014-07-02 10:11 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 291 bytes --]

On Wed, 02 Jul 2014 11:07:13 +0100
Pedro Teixeira <finas@aeiou.pt> wrote:

> [88623.524481] Buffer I/O error on device md0, logical block 458227712

Ah sorry I have missed these messages quoted in the original mail. Then of
course, it is not an FS issue.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 10:11     ` Roman Mamedov
@ 2014-07-02 10:37       ` Pedro Teixeira
  2014-07-02 11:03       ` Pedro Teixeira
  1 sibling, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 10:37 UTC (permalink / raw)
  To: linux-raid

I also did a mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>  
raid.b and none of the bad blocks present on the disks are on the  
range of the ones that are giving out the read error.

Also, is there a way to clear the badblocks list without destroying  
the filesystem?



________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 10:11     ` Roman Mamedov
  2014-07-02 10:37       ` Pedro Teixeira
@ 2014-07-02 11:03       ` Pedro Teixeira
  1 sibling, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 11:03 UTC (permalink / raw)
  To: linux-raid

Hi Neil,

"
Can't possible happen!
(Do worry, I say that a lot - I'm usually wrong).
"

:)

- I'll simply do a dd if=/dev/md0 off=/dev/null and see what errors  
show up. I will report back when if finishes.

- Debian squeeze x64, with custom 3.15.3 kernel and mdadm 3.3. The md  
volume was created with mdadm 3.3 and kernel 3.13 or 3.14 I think.

Cheers
Pedro

________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02  9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
  2014-07-02  9:52 ` Roman Mamedov
@ 2014-07-02 10:45 ` NeilBrown
  2014-07-02 11:54   ` Pedro Teixeira
  1 sibling, 1 reply; 19+ messages in thread
From: NeilBrown @ 2014-07-02 10:45 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

On Wed, 02 Jul 2014 10:32:41 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:

> - I'm having the following problem on a raid6 md volume consisting og  
> 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
> 
>   - every time I run a fsck.ext4 I will get the exact same errors (  
> ...short read ). Forcing a repair on the md0 volume shows no errors  
> and completes without problems. All disks are active and the volume is  
> not degraded, still I can't get rid of the short errors on those 16  
> blocks and when the filesystem is mounted the read errors will come up  
> from time to time as they are probably in use.
> 
> - If I try to read those blocks with DD  ( dd if=/dev/md0  of=test.txt  
> seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file  
> but the file doesn't appear to have nothing on it ( and the file  
> doesn't take the 1.8T on disk as the disk is much smaller )
> 
> - this started happening after having a three disk failure. I  
> recovered from that failure by recreating the array with the  
> non-failed 13 disks plus the last failed one ( events didn't differ  
> much ). I then readed the other disks. The failed disks are all  
> physically good, tested them with hdat2 and they don't have read/write  
> errors so I reused them. I don't know why they failed, maybe some  
> incompatibility with SSHD's and the LSI HBA controller..
> 
> root@nas3:/# dd if=/dev/md0  of=teste.txt seek=458227712 count=6 bs=4096
> 6+0 records in
> 6+0 records out
> 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
> root@nas3:/# ls -lah teste.txt
> -rw-r--r-- 1 root root 1.8T Jul  2 10:22 teste.txt
> root@nas3:/#
> 
> 
> 
> root@nas3:/# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]  
> sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
>        13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2  
> [16/16] [UUUUUUUUUUUUUUUU]
> 
> - When doing a fsck.ext4 of /dev/md0 it returns the following ( and I  
> can do it over and over again with the exact same errors) :
> 
> root@nas3:/# fsck.ext4 -f /dev/md0
> e2fsck 1.42.10 (18-May-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Error reading block 458227712 (Attempt to read block from filesystem  
> resulted in short read) while reading inode and block bitmaps.  Ignore  
> error<y>? yes


Can't possible happen!

(Do worry, I say that a lot - I'm usually wrong).

What sort of computer?  Particularly is it 32bit or 64bit?

Try using 'dd' to read a few meg at various offsets (1G, 2G, 4G, 6G, 8G, ....)
and find out if there is a pattern, where it can read and where it cannot.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 10:45 ` NeilBrown
@ 2014-07-02 11:54   ` Pedro Teixeira
       [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 11:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are  
seagate sshd ST1000DX001.

So I run the "dd if=/dev/md0 of=/dev/null  bs=4096" and it failed on  
alot of places. I had to restart the command several times with the  
skip parameter set to a couple of blocks after the last block error.  
It run for about 1.5TB of the total 13TB of the volume.
The md volume didn't drop any drive when running this.

dmesg showed:

[ 1678.478156] Buffer I/O error on device md0, logical block 196012546
[ 1678.478314] Buffer I/O error on device md0, logical block 196012547
[ 1678.478462] Buffer I/O error on device md0, logical block 196012548
[ 1678.478737] Buffer I/O error on device md0, logical block 196012549
[ 1678.479077] Buffer I/O error on device md0, logical block 196012550
[ 1678.479415] Buffer I/O error on device md0, logical block 196012551
[ 1678.479754] Buffer I/O error on device md0, logical block 196012552
[ 1678.480082] Buffer I/O error on device md0, logical block 196012553
[ 1678.480679] Buffer I/O error on device md0, logical block 196012630
[ 1678.480811] Buffer I/O error on device md0, logical block 196012758
[ 2305.139382] quiet_error: 369 callbacks suppressed
[ 2305.139385] Buffer I/O error on device md0, logical block 196012759
[ 2310.592687] Buffer I/O error on device md0, logical block 196012760
[ 2313.135470] Buffer I/O error on device md0, logical block 196012761
[ 2315.971196] Buffer I/O error on device md0, logical block 196012762
[ 2319.013647] Buffer I/O error on device md0, logical block 196012763
[ 2321.125008] Buffer I/O error on device md0, logical block 196012764
[ 2323.774654] Buffer I/O error on device md0, logical block 196012765
[ 2327.439527] Buffer I/O error on device md0, logical block 196012766
[ 2329.399068] Buffer I/O error on device md0, logical block 196012767
[ 2331.389823] Buffer I/O error on device md0, logical block 196012768
[ 2334.166786] Buffer I/O error on device md0, logical block 196012769
[ 2337.817145] Buffer I/O error on device md0, logical block 196012770
[ 2340.713005] Buffer I/O error on device md0, logical block 196012771
[ 2342.594948] Buffer I/O error on device md0, logical block 196012772
[ 2344.678599] Buffer I/O error on device md0, logical block 196012773
[ 2347.150423] Buffer I/O error on device md0, logical block 196012774
[ 2349.433777] Buffer I/O error on device md0, logical block 196012775
[ 2351.559728] Buffer I/O error on device md0, logical block 196012776
[ 2353.650886] Buffer I/O error on device md0, logical block 196012777
[ 2385.719365] Buffer I/O error on device md0, logical block 196012778
[ 2388.937566] Buffer I/O error on device md0, logical block 196012779
[ 2391.831046] Buffer I/O error on device md0, logical block 196012780
[ 2393.971170] Buffer I/O error on device md0, logical block 196012781
[ 2396.118172] Buffer I/O error on device md0, logical block 196012782
[ 2399.717491] Buffer I/O error on device md0, logical block 196012783
[ 2401.913373] Buffer I/O error on device md0, logical block 196012784
[ 2403.892253] Buffer I/O error on device md0, logical block 196012785
[ 2405.796383] Buffer I/O error on device md0, logical block 196012786
[ 2408.171017] Buffer I/O error on device md0, logical block 196012787
[ 2410.233107] Buffer I/O error on device md0, logical block 196012788
[ 2413.184341] Buffer I/O error on device md0, logical block 196012789
[ 2416.396825] Buffer I/O error on device md0, logical block 196012790
[ 2420.734772] Buffer I/O error on device md0, logical block 196012890
[ 2426.320297] Buffer I/O error on device md0, logical block 196013570
[ 2426.320397] Buffer I/O error on device md0, logical block 196013571
[ 2426.320504] Buffer I/O error on device md0, logical block 196013572
[ 2426.320595] Buffer I/O error on device md0, logical block 196013573
[ 2426.320686] Buffer I/O error on device md0, logical block 196013574
[ 2426.320778] Buffer I/O error on device md0, logical block 196013575
[ 2426.320877] Buffer I/O error on device md0, logical block 196013576
[ 2426.321024] Buffer I/O error on device md0, logical block 196013577
[ 2426.321193] Buffer I/O error on device md0, logical block 196013578
[ 2436.240507] quiet_error: 119 callbacks suppressed
[ 2436.240509] Buffer I/O error on device md0, logical block 196012900
[ 2440.078873] Buffer I/O error on device md0, logical block 196012910
[ 2442.323624] Buffer I/O error on device md0, logical block 196012920
[ 2445.852897] Buffer I/O error on device md0, logical block 196013570
[ 2454.009848] Buffer I/O error on device md0, logical block 196013570
[ 2456.810436] Buffer I/O error on device md0, logical block 196013570
[ 2461.672818] Buffer I/O error on device md0, logical block 196014336
[ 2461.672901] Buffer I/O error on device md0, logical block 196014464
[ 2461.672985] Buffer I/O error on device md0, logical block 196014337
[ 2461.673109] Buffer I/O error on device md0, logical block 196014465
[ 2461.695280] Buffer I/O error on device md0, logical block 196014592
[ 2461.695371] Buffer I/O error on device md0, logical block 196014720
[ 2461.695458] Buffer I/O error on device md0, logical block 196014593
[ 2461.695548] Buffer I/O error on device md0, logical block 196014721
[ 2461.695633] Buffer I/O error on device md0, logical block 196014336
[ 2465.937036] Buffer I/O error on device md0, logical block 196125442
[ 2538.797979] quiet_error: 252 callbacks suppressed
[ 2538.797982] Buffer I/O error on device md0, logical block 217780096
[ 2538.798084] Buffer I/O error on device md0, logical block 217780097
[ 2538.798163] Buffer I/O error on device md0, logical block 217780098
[ 2538.798240] Buffer I/O error on device md0, logical block 217780099
[ 2538.798321] Buffer I/O error on device md0, logical block 217780100
[ 2538.798404] Buffer I/O error on device md0, logical block 217780101
[ 2538.798486] Buffer I/O error on device md0, logical block 217780102
[ 2538.798569] Buffer I/O error on device md0, logical block 217780103
[ 2538.798681] Buffer I/O error on device md0, logical block 217780104
[ 2538.798812] Buffer I/O error on device md0, logical block 217780105
[ 2582.229715] quiet_error: 607 callbacks suppressed
[ 2582.229717] Buffer I/O error on device md0, logical block 217780106
[ 2584.667289] Buffer I/O error on device md0, logical block 217780107
[ 2590.211304] Buffer I/O error on device md0, logical block 228358304
[ 2590.211388] Buffer I/O error on device md0, logical block 228358432
[ 2590.211467] Buffer I/O error on device md0, logical block 228358560
[ 2590.211555] Buffer I/O error on device md0, logical block 228358305
[ 2590.211628] Buffer I/O error on device md0, logical block 228358433
[ 2590.211712] Buffer I/O error on device md0, logical block 228358561
[ 2590.211792] Buffer I/O error on device md0, logical block 228358306
[ 2590.211871] Buffer I/O error on device md0, logical block 228358434
[ 2590.211945] Buffer I/O error on device md0, logical block 228358562
[ 2590.212025] Buffer I/O error on device md0, logical block 228358307
[ 2652.455446] quiet_error: 375 callbacks suppressed
[ 2652.455449] Buffer I/O error on device md0, logical block 260370751
[ 2652.455541] Buffer I/O error on device md0, logical block 260370752
[ 2652.455618] Buffer I/O error on device md0, logical block 260370753
[ 2652.455694] Buffer I/O error on device md0, logical block 260370754
[ 2652.455779] Buffer I/O error on device md0, logical block 260370755
[ 2652.455853] Buffer I/O error on device md0, logical block 260370756
[ 2652.455930] Buffer I/O error on device md0, logical block 260370757
[ 2652.456003] Buffer I/O error on device md0, logical block 260370758
[ 2652.456090] Buffer I/O error on device md0, logical block 260370759
[ 2652.456166] Buffer I/O error on device md0, logical block 260370760
[ 2695.663954] quiet_error: 56 callbacks suppressed
[ 2695.663957] Buffer I/O error on device md0, logical block 262508480
[ 2695.664039] Buffer I/O error on device md0, logical block 262508608
[ 2695.664113] Buffer I/O error on device md0, logical block 262508736
[ 2695.664197] Buffer I/O error on device md0, logical block 262508481
[ 2695.664264] Buffer I/O error on device md0, logical block 262508609
[ 2695.664344] Buffer I/O error on device md0, logical block 262508737
[ 2695.664417] Buffer I/O error on device md0, logical block 262508482
[ 2695.664489] Buffer I/O error on device md0, logical block 262508610
[ 2695.664557] Buffer I/O error on device md0, logical block 262508738
[ 2695.664632] Buffer I/O error on device md0, logical block 262508483
[ 2980.623591] quiet_error: 312 callbacks suppressed
[ 2980.623595] Buffer I/O error on device md0, logical block 370515910
[ 2980.623676] Buffer I/O error on device md0, logical block 370516038
[ 2980.623761] Buffer I/O error on device md0, logical block 370515911
[ 2980.623828] Buffer I/O error on device md0, logical block 370516039
[ 2980.623903] Buffer I/O error on device md0, logical block 370515912
[ 2980.623970] Buffer I/O error on device md0, logical block 370516040
[ 2980.624046] Buffer I/O error on device md0, logical block 370515913
[ 2980.624119] Buffer I/O error on device md0, logical block 370516041
[ 2980.624191] Buffer I/O error on device md0, logical block 370515914
[ 2980.624262] Buffer I/O error on device md0, logical block 370516042
[ 3005.209442] quiet_error: 281 callbacks suppressed
[ 3005.209444] Buffer I/O error on device md0, logical block 370516043
[ 3010.575774] Buffer I/O error on device md0, logical block 372582176
[ 3010.575854] Buffer I/O error on device md0, logical block 372582304
[ 3010.575927] Buffer I/O error on device md0, logical block 372582432
[ 3010.576004] Buffer I/O error on device md0, logical block 372582177
[ 3010.576082] Buffer I/O error on device md0, logical block 372582305
[ 3010.576147] Buffer I/O error on device md0, logical block 372582433
[ 3010.576232] Buffer I/O error on device md0, logical block 372582178
[ 3010.576298] Buffer I/O error on device md0, logical block 372582306
[ 3010.576361] Buffer I/O error on device md0, logical block 372582434
[ 3024.205000] quiet_error: 472 callbacks suppressed
[ 3024.205003] Buffer I/O error on device md0, logical block 375180000
[ 3024.205082] Buffer I/O error on device md0, logical block 375180128
[ 3024.205154] Buffer I/O error on device md0, logical block 375180256
[ 3024.205229] Buffer I/O error on device md0, logical block 375180001
[ 3024.205308] Buffer I/O error on device md0, logical block 375180129
[ 3024.205374] Buffer I/O error on device md0, logical block 375180257
[ 3024.205441] Buffer I/O error on device md0, logical block 375180002
[ 3024.205509] Buffer I/O error on device md0, logical block 375180130
[ 3024.205581] Buffer I/O error on device md0, logical block 375180258
[ 3024.205655] Buffer I/O error on device md0, logical block 375180003
[ 3182.726623] quiet_error: 183 callbacks suppressed
[ 3182.726626] Buffer I/O error on device md0, logical block 434495873
[ 3182.726708] Buffer I/O error on device md0, logical block 434495874
[ 3182.726787] Buffer I/O error on device md0, logical block 434495875
[ 3182.726857] Buffer I/O error on device md0, logical block 434495876
[ 3182.726927] Buffer I/O error on device md0, logical block 434495877
[ 3182.727036] Buffer I/O error on device md0, logical block 434495878
[ 3182.727129] Buffer I/O error on device md0, logical block 434495879
[ 3182.727210] Buffer I/O error on device md0, logical block 434495880
[ 3182.727292] Buffer I/O error on device md0, logical block 434495881
[ 3182.727374] Buffer I/O error on device md0, logical block 434495882
[ 3201.149784] quiet_error: 118 callbacks suppressed
[ 3201.149786] Buffer I/O error on device md0, logical block 434495883
[ 3243.707353] Buffer I/O error on device md0, logical block 458225568
[ 3243.707439] Buffer I/O error on device md0, logical block 458225569
[ 3243.707526] Buffer I/O error on device md0, logical block 458225570
[ 3243.707600] Buffer I/O error on device md0, logical block 458225571
[ 3243.707675] Buffer I/O error on device md0, logical block 458225572
[ 3243.707748] Buffer I/O error on device md0, logical block 458225573
[ 3243.707825] Buffer I/O error on device md0, logical block 458225574
[ 3243.707903] Buffer I/O error on device md0, logical block 458225575
[ 3243.707975] Buffer I/O error on device md0, logical block 458225576
[ 3410.602968] quiet_error: 139 callbacks suppressed
[ 3410.602971] Buffer I/O error on device md0, logical block 490875483
[ 3410.603049] Buffer I/O error on device md0, logical block 490875611
[ 3410.603126] Buffer I/O error on device md0, logical block 490875484
[ 3410.603204] Buffer I/O error on device md0, logical block 490875612
[ 3410.603279] Buffer I/O error on device md0, logical block 490875485
[ 3410.603349] Buffer I/O error on device md0, logical block 490875613
[ 3410.603424] Buffer I/O error on device md0, logical block 490875486
[ 3410.603509] Buffer I/O error on device md0, logical block 490875614
[ 3410.603592] Buffer I/O error on device md0, logical block 490875487
[ 3410.603663] Buffer I/O error on device md0, logical block 490875615


The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>   
raid.b" before and after running the "dd" command returned no changes:


Bad-blocks on /dev/sdb:
            112269328 for 512 sectors
            112269840 for 512 sectors
            112271376 for 512 sectors
            112271888 for 512 sectors
            112272400 for 512 sectors
            112272912 for 512 sectors
            112273424 for 512 sectors
            112273936 for 512 sectors
            112333840 for 512 sectors
            112334352 for 512 sectors
            112337680 for 128 sectors
            130752768 for 512 sectors
            130753280 for 512 sectors
            130755840 for 512 sectors
            130756352 for 512 sectors
            130757120 for 384 sectors
            149045752 for 512 sectors
            149046264 for 512 sectors
            212193536 for 512 sectors
            212194048 for 512 sectors
            248914952 for 512 sectors
            248915464 for 512 sectors
            262105344 for 512 sectors
            262105856 for 512 sectors
            273867480 for 512 sectors
            273867992 for 512 sectors
Bad-blocks list is empty in /dev/sdc
Bad-blocks list is empty in /dev/sdd
Bad-blocks on /dev/sde:
            114228480 for 512 sectors
            114228992 for 512 sectors
Bad-blocks on /dev/sdf:
            248545288 for 512 sectors
            248545800 for 512 sectors
            487421952 for 512 sectors
            487422464 for 512 sectors
            487422976 for 128 sectors
Bad-blocks list is empty in /dev/sdg
Bad-blocks on /dev/sdh:
            280763096 for 512 sectors
            280763608 for 512 sectors
Bad-blocks list is empty in /dev/sdi
Bad-blocks list is empty in /dev/sdj
Bad-blocks on /dev/sdk:
            124707840 for 512 sectors
            124708352 for 512 sectors
            124708864 for 512 sectors
            124709376 for 512 sectors
            124712192 for 384 sectors
            130771840 for 256 sectors
            130803968 for 512 sectors
            130804480 for 512 sectors
            130808960 for 256 sectors
            130852224 for 256 sectors
            130852608 for 256 sectors
            130853120 for 256 sectors
            130859520 for 256 sectors
            150267392 for 512 sectors
            150267904 for 512 sectors
            211985968 for 512 sectors
            211986480 for 512 sectors
            212037552 for 256 sectors
            212051504 for 512 sectors
            212052016 for 512 sectors
            213166336 for 512 sectors
            213166848 for 512 sectors
            213167360 for 512 sectors
            213167872 for 512 sectors
            213177600 for 512 sectors
            213178112 for 512 sectors
            214650624 for 512 sectors
            214651136 for 512 sectors
            249476104 for 512 sectors
            249476616 for 512 sectors
            262317312 for 512 sectors
            262317824 for 512 sectors
            262318464 for 512 sectors
            262318976 for 256 sectors
            262321408 for 512 sectors
            262321920 for 512 sectors
            714478672 for 512 sectors
            714479184 for 512 sectors
            714754128 for 512 sectors
            714754640 for 512 sectors
            714755152 for 512 sectors
            714755664 for 512 sectors
            935584432 for 512 sectors
            935584944 for 512 sectors
            940173568 for 512 sectors
            940174080 for 512 sectors
            976792224 for 512 sectors
            976792736 for 512 sectors
            976793248 for 512 sectors
            976793760 for 512 sectors
            980668064 for 512 sectors
            980668576 for 512 sectors
            980669088 for 512 sectors
            980669600 for 512 sectors
Bad-blocks on /dev/sdl:
            112269328 for 512 sectors
            112269840 for 512 sectors
            112271376 for 512 sectors
            112271376 for 512 sectors
            112271888 for 512 sectors
            112272400 for 512 sectors
            112272912 for 512 sectors
            112273424 for 512 sectors
            112273936 for 512 sectors
            112333840 for 512 sectors
            112334352 for 512 sectors
            112337680 for 128 sectors
            114228480 for 512 sectors
            114228992 for 512 sectors
            124707840 for 512 sectors
            124708352 for 512 sectors
            124708864 for 512 sectors
            124709376 for 512 sectors
            124712192 for 384 sectors
            130752768 for 512 sectors
            130753280 for 512 sectors
            130755840 for 512 sectors
            130756352 for 512 sectors
            130757120 for 384 sectors
            130771840 for 256 sectors
            130803968 for 512 sectors
            130804480 for 512 sectors
            130808960 for 256 sectors
            130852224 for 256 sectors
            130852608 for 256 sectors
            130853120 for 256 sectors
            130859520 for 256 sectors
            149045752 for 512 sectors
            149046264 for 512 sectors
            150267392 for 512 sectors
            150267904 for 512 sectors
            211985968 for 512 sectors
            211986480 for 512 sectors
            211996592 for 128 sectors
            212037552 for 256 sectors
            212051504 for 512 sectors
            212052016 for 512 sectors
            212193536 for 512 sectors
            212194048 for 512 sectors
            213166336 for 512 sectors
            213166848 for 512 sectors
            213167360 for 512 sectors
            213167872 for 512 sectors
            213177600 for 512 sectors
            213178112 for 512 sectors
            214650624 for 512 sectors
            214651136 for 512 sectors
            248545288 for 512 sectors
            248545800 for 512 sectors
            248914952 for 512 sectors
            248915464 for 512 sectors
            249476104 for 512 sectors
            249476616 for 512 sectors
            262105344 for 512 sectors
            262105856 for 512 sectors
            262317312 for 512 sectors
            262317824 for 512 sectors
            262318464 for 512 sectors
            262318976 for 256 sectors
            262321408 for 512 sectors
            262321920 for 512 sectors
            273867480 for 512 sectors
            273867992 for 512 sectors
            280763096 for 512 sectors
            280763608 for 512 sectors
            487421952 for 512 sectors
            487422464 for 512 sectors
            487422976 for 128 sectors
            714478672 for 512 sectors
            714479184 for 512 sectors
            714754128 for 512 sectors
            714754640 for 512 sectors
            714755152 for 512 sectors
            714755664 for 512 sectors
            935584432 for 512 sectors
            935584944 for 512 sectors
            940173568 for 512 sectors
            940174080 for 512 sectors
            976792224 for 512 sectors
            976792736 for 512 sectors
            976793248 for 512 sectors
            976793760 for 512 sectors
            980668064 for 512 sectors
            980668576 for 512 sectors
            980669088 for 512 sectors
            980669600 for 512 sectors
Bad-blocks on /dev/sdm:
            112269328 for 512 sectors
            112269840 for 512 sectors
            112271376 for 512 sectors
            112271888 for 512 sectors
            112272400 for 512 sectors
            112272912 for 512 sectors
            112273424 for 512 sectors
            112273936 for 512 sectors
            112333840 for 512 sectors
            112334352 for 512 sectors
            112337680 for 128 sectors
            114228480 for 512 sectors
            114228992 for 512 sectors
            124707840 for 512 sectors
       124708352 for 512 sectors
            124708864 for 512 sectors
            124709376 for 512 sectors
            124712192 for 384 sectors
            130752768 for 512 sectors
            130753280 for 512 sectors
            130755840 for 512 sectors
            130756352 for 512 sectors
            130757120 for 384 sectors
            130771840 for 256 sectors
            130803968 for 512 sectors
            130804480 for 512 sectors
            130808960 for 256 sectors
            130852224 for 256 sectors
            130852608 for 256 sectors
            130853120 for 256 sectors
            130859520 for 256 sectors
            149045752 for 512 sectors
            149046264 for 512 sectors
            150267392 for 512 sectors
            150267904 for 512 sectors
            211985968 for 512 sectors
            211986480 for 512 sectors
            211996592 for 128 sectors
            212037552 for 256 sectors
            212051504 for 512 sectors
            212052016 for 512 sectors
            212193536 for 512 sectors
            212194048 for 512 sectors
            213166336 for 512 sectors
            213166848 for 512 sectors
            213167360 for 512 sectors
            213167872 for 512 sectors
            213177600 for 512 sectors
            213178112 for 512 sectors
            214650624 for 512 sectors
            214651136 for 512 sectors
            248545288 for 512 sectors
            248545800 for 512 sectors
            248914952 for 512 sectors
            248915464 for 512 sectors
            249476104 for 512 sectors
            249476616 for 512 sectors
            262105344 for 512 sectors
            262105856 for 512 sectors
            262317312 for 512 sectors
            262317824 for 512 sectors
            262318464 for 512 sectors
            262318976 for 256 sectors
            262321408 for 512 sectors
            262321920 for 512 sectors
  273867480 for 512 sectors
            273867992 for 512 sectors
            280763096 for 512 sectors
            280763608 for 512 sectors
            487421952 for 512 sectors
            487422464 for 512 sectors
            487422976 for 128 sectors
            714478672 for 512 sectors
            714479184 for 512 sectors
            714754128 for 512 sectors
            714754640 for 512 sectors
            714755152 for 512 sectors
            714755664 for 512 sectors
            935584432 for 512 sectors
            935584944 for 512 sectors
            940173568 for 512 sectors
            940174080 for 512 sectors
            976792224 for 512 sectors
            976792736 for 512 sectors
            976793248 for 512 sectors
            976793760 for 512 sectors
            980668064 for 512 sectors
            980668576 for 512 sectors
            980669088 for 512 sectors
            980669600 for 512 sectors
Bad-blocks on /dev/sdn:
            112269328 for 512 sectors
            112269840 for 512 sectors
            112271376 for 512 sectors
            112271888 for 512 sectors
            112272400 for 512 sectors
            112272912 for 512 sectors
            112273424 for 512 sectors
            112273936 for 512 sectors
            112333840 for 512 sectors
            112334352 for 512 sectors
            112337680 for 128 sectors
            114228480 for 512 sectors
            114228992 for 512 sectors
            124707840 for 512 sectors
            124708352 for 512 sectors
            124708864 for 512 sectors
            124709376 for 512 sectors
            124712192 for 384 sectors
            130752768 for 512 sectors
            130753280 for 512 sectors
            130755840 for 512 sectors
            130756352 for 512 sectors
            130757120 for 384 sectors
            130771840 for 256 sectors
            130803968 for 512 sectors
    130804480 for 512 sectors
            130808960 for 256 sectors
            130852224 for 256 sectors
            130852608 for 256 sectors
            130853120 for 256 sectors
            130859520 for 256 sectors
            149045752 for 512 sectors
            149046264 for 512 sectors
            150267392 for 512 sectors
            150267904 for 512 sectors
            211985968 for 512 sectors
            211986480 for 512 sectors
            211996592 for 128 sectors
            212037552 for 256 sectors
            212051504 for 512 sectors
            212052016 for 512 sectors
            212193536 for 512 sectors
            212194048 for 512 sectors
            213166336 for 512 sectors
            213166848 for 512 sectors
            213167360 for 512 sectors
            213167872 for 512 sectors
            213177600 for 512 sectors
            213178112 for 512 sectors
            214650624 for 512 sectors
            214651136 for 512 sectors
            248545288 for 512 sectors
            248545800 for 512 sectors
            248914952 for 512 sectors
            248915464 for 512 sectors
            249476104 for 512 sectors
            249476616 for 512 sectors
            262105344 for 512 sectors
            262105856 for 512 sectors
            262317312 for 512 sectors
            262317824 for 512 sectors
            262318464 for 512 sectors
            262318976 for 256 sectors
            262321408 for 512 sectors
            262321920 for 512 sectors
            273867480 for 512 sectors
            273867992 for 512 sectors
            280763096 for 512 sectors
            280763608 for 512 sectors
            487421952 for 512 sectors
            487422464 for 512 sectors
            487422976 for 128 sectors
            714478672 for 512 sectors
            714479184 for 512 sectors
            714754128 for 512 sectors
            714754640 for 512 sectors
  714755152 for 512 sectors
            714755664 for 512 sectors
            935584432 for 512 sectors
            935584944 for 512 sectors
            940173568 for 512 sectors
            940174080 for 512 sectors
            976792224 for 512 sectors
            976792736 for 512 sectors
            976793248 for 512 sectors
            976793760 for 512 sectors
            980668064 for 512 sectors
            980668576 for 512 sectors
            980669088 for 512 sectors
            980669600 for 512 sectors
Bad-blocks on /dev/sdo:
            112269328 for 512 sectors
            112269840 for 512 sectors
            112271376 for 512 sectors
            112271888 for 512 sectors
            112272400 for 512 sectors
            112272912 for 512 sectors
            112273424 for 512 sectors
            112273936 for 512 sectors
            112333840 for 512 sectors
            112334352 for 512 sectors
            112337680 for 128 sectors
            114228480 for 512 sectors
            114228992 for 512 sectors
            124707840 for 512 sectors
            124708352 for 512 sectors
            124708864 for 512 sectors
            124709376 for 512 sectors
            124712192 for 384 sectors
            130752768 for 512 sectors
            130753280 for 512 sectors
            130755840 for 512 sectors
            130756352 for 512 sectors
            130757120 for 384 sectors
            130771840 for 256 sectors
            130803968 for 512 sectors
            130804480 for 512 sectors
            130808960 for 256 sectors
            130852224 for 256 sectors
            130852608 for 256 sectors
            130853120 for 256 sectors
            130859520 for 256 sectors
            149045752 for 512 sectors
            149046264 for 512 sectors
            150267392 for 512 sectors
            150267904 for 512 sectors
            211985968 for 512 sectors
            211986480 for 512 sectors
           211996592 for 128 sectors
            212037552 for 256 sectors
            212051504 for 512 sectors
            212052016 for 512 sectors
            212193536 for 512 sectors
            212194048 for 512 sectors
            213166336 for 512 sectors
            213166848 for 512 sectors
            213167360 for 512 sectors
            213167872 for 512 sectors
            213177600 for 512 sectors
            213178112 for 512 sectors
            214650624 for 512 sectors
            214651136 for 512 sectors
            248545288 for 512 sectors
            248545800 for 512 sectors
            248914952 for 512 sectors
            248915464 for 512 sectors
            249476104 for 512 sectors
            249476616 for 512 sectors
            262105344 for 512 sectors
            262105856 for 512 sectors
            262317312 for 512 sectors
            262317824 for 512 sectors
            262318464 for 512 sectors
            262318976 for 256 sectors
            262321408 for 512 sectors
            262321920 for 512 sectors
            273867480 for 512 sectors
            273867992 for 512 sectors
            280763096 for 512 sectors
            280763608 for 512 sectors
            487421952 for 512 sectors
            487422464 for 512 sectors
            487422976 for 128 sectors
            714478672 for 512 sectors
            714479184 for 512 sectors
            714754128 for 512 sectors
            714754640 for 512 sectors
            714755152 for 512 sectors
            714755664 for 512 sectors
            935584432 for 512 sectors
            935584944 for 512 sectors
            940173568 for 512 sectors
            940174080 for 512 sectors
            976792224 for 512 sectors
            976792736 for 512 sectors
            976793248 for 512 sectors
            976793760 for 512 sectors
            980668064 for 512 sectors
            980668576 for 512 sectors
            980669088 for 512 sectors
            980669600 for 512 sectors
Bad-blocks list is empty in /dev/sdp
Bad-blocks on /dev/sdq:
            211996592 for 128 sectors















________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>]

* Re: strange problem with raid6 read errors on active non-degraded array
       [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
@ 2014-07-02 14:14       ` Pedro Teixeira
  2014-07-02 14:55         ` Lars Täuber
  2014-07-02 16:35         ` Ethan Wilson
  0 siblings, 2 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 14:14 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-raid

Hi Lars,

the output of those commands:

root@nas3:/# cat /sys/block/sdb/queue/physical_block_size
4096
root@nas3:/# cat /sys/block/md0/queue/physical_block_size
4096
root@nas3:/#

The strange thing here is that dmesg is not poluted with sata errors  
like it is usual when a hard disk has bad sectors or some other  
hardware problem. the only thing in dmesg that hints to why reading  
the md volume fails are from dm itself.

Cheers
Pedro


Citando Lars Täuber
> Hi Pedro,
>
> maybe an issue with the logical/physical blocksize?
> What tell these commands:
>
> cat /sys/block/sdb/queue/physical_block_size
> cat /sys/block/md0/queue/physical_block_size
>
> Seagate says there are 4096 bytes/sector on this devices.
>
> Lars



________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 14:14       ` Pedro Teixeira
@ 2014-07-02 14:55         ` Lars Täuber
  2014-07-02 16:35         ` Ethan Wilson
  1 sibling, 0 replies; 19+ messages in thread
From: Lars Täuber @ 2014-07-02 14:55 UTC (permalink / raw)
  To: linux-raid

Hi Pedro,

Wed, 02 Jul 2014 15:14:06 +0100
Pedro Teixeira <finas@aeiou.pt> ==> Lars Täuber <taeuber@bbaw.de> :
> Hi Lars,
> 
> the output of those commands:
> 
> root@nas3:/# cat /sys/block/sdb/queue/physical_block_size
> 4096
> root@nas3:/# cat /sys/block/md0/queue/physical_block_size
> 4096
> root@nas3:/#
> 
> The strange thing here is that dmesg is not poluted with sata errors  
> like it is usual when a hard disk has bad sectors or some other  
> hardware problem. the only thing in dmesg that hints to why reading  
> the md volume fails are from dm itself.

maybe because the controller-drive combination doesn't fit.
Does the controller tell some errors?
The LSI 9201-i16 compatibility list doesn't mention any 4k SATA drive.
Only 3 4k-SAS drives (seagate though) are mentioned to be compatible.

Maybe that's the cause?

Good luck
Lars


> Cheers
> Pedro
> 
> 
> Citando Lars Täuber
> > Hi Pedro,
> >
> > maybe an issue with the logical/physical blocksize?
> > What tell these commands:
> >
> > cat /sys/block/sdb/queue/physical_block_size
> > cat /sys/block/md0/queue/physical_block_size
> >
> > Seagate says there are 4096 bytes/sector on this devices.
> >
> > Lars
> 
> 
> 
> ________________________________________________________________________________
> Mensagem enviada através do email grátis AEIOU
> http://www.aeiou.pt


-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstraße 22-23                      10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 14:14       ` Pedro Teixeira
  2014-07-02 14:55         ` Lars Täuber
@ 2014-07-02 16:35         ` Ethan Wilson
       [not found]           ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
  1 sibling, 1 reply; 19+ messages in thread
From: Ethan Wilson @ 2014-07-02 16:35 UTC (permalink / raw)
  To: Pedro Teixeira, Lars Täuber; +Cc: linux-raid

You have multiple bad-blocks list (an MD feature) which are already full 
of sectors. Those are earlier disk errors which were stored on MD 
headers (one list per drive).

MD will not try to read from such sectors anymore, and during reads MD 
will return error to the upper layers immediately. This is if the stripe 
does not have enough good components to read after excluding the bad 
blocks, e.g. raid5 is able to tolerate up to 1 disk with badblocks in a 
stripe, so with 2 badblocks in 2 different disks in the same stripes MD 
will return a read error immediately and without trying.
That's why in dmesg you are seeing read errors from MD but not from the 
component devices.

Now the question is how could so many badblocks be recorded on your array.
It seems very unlikely that so many disks of your array are in such bad 
shape .  This might indicate an MD bug in the badblocks code.
I am thinking some form of erroneous propagation of bad blocks, so that 
e.g. writing to an area where an MD badblock exists, instead of clearing 
the bad block could have propagated the badblock to the other disks in 
the same stripe. Something like that.

See if you can check that writing to a bad block clears it. It will be 
difficult to compute the correct offset to write to, though. You might 
want to do some trials-and-errors with dd together with blktrace. If you 
can do that, you might want to check that it behaves correctly even when 
writing something that does not align to 512b or 4k . Obviously this 
test is desctructive wrt your data in that location.

Another easier test is if to try to read with dd from a component device 
itself. If MD has recorded (even if happened long time in the past) a 
bad block there, the direct read with dd should also hit it, return 
error and stop, because badblocks in the surface of disks do not heal by 
themselves with time.

Another test is to read from md0 with dd from an area where you see that 
only 1 disk has badblocks (probably requires some trial and error with 
blktrace because the offsets of md0 are not equal to the offsets of the 
component devices) . If MD works correctly, with such read it should 
"heal" the badblock: compute from parity from the other disks, then 
write over the badblock. The MD badblock should disappear.

The last 2 tests I described should not be destructive except in case of 
MD bugs.

EW

On 02/07/2014 16:14, Pedro Teixeira wrote:
> Hi Lars,
>
> the output of those commands:
>
> root@nas3:/# cat /sys/block/sdb/queue/physical_block_size
> 4096
> root@nas3:/# cat /sys/block/md0/queue/physical_block_size
> 4096
> root@nas3:/#
>
> The strange thing here is that dmesg is not poluted with sata errors 
> like it is usual when a hard disk has bad sectors or some other 
> hardware problem. the only thing in dmesg that hints to why reading 
> the md volume fails are from dm itself.
>
> Cheers
> Pedro
>
>
> Citando Lars Täuber
>> Hi Pedro,
>>
>> maybe an issue with the logical/physical blocksize?
>> What tell these commands:
>>
>> cat /sys/block/sdb/queue/physical_block_size
>> cat /sys/block/md0/queue/physical_block_size
>>
>> Seagate says there are 4096 bytes/sector on this devices.
>>
>> Lars
>
>
>
> ________________________________________________________________________________ 
>
> Mensagem enviada através do email grátis AEIOU
> http://www.aeiou.pt
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>]

* Re: strange problem with raid6 read errors on active non-degraded array
       [not found]           ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
@ 2014-07-02 21:34             ` Ethan Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Ethan Wilson @ 2014-07-02 21:34 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: Lars Täuber, linux-raid

On 02/07/2014 20:28, Pedro Teixeira wrote:
>
> Hi Ethan,
>
> The thing here is that some of the bad blocks ( if not all ) that are 
> giving read errors are not on the bad blocks list.
>

Are you sure? Please note that the offset is a complex topic because an 
offset given by fsck will be a sector offset in the md0 sense, while the 
device badblock list contains offset in the device sense, which means 
that to convert one onto the other you have to divide, or multiply, by 
the number of data disks, approximately, and handle the remainder 
manually also considering the problem of the rotating parity. Not 
simple. Is this the computation that you did?

> Specifically, the ones that show up when doing a fsck are not on any 
> drive. For these sectors fsck tries to re-write then and md still 
> throws an error but they are not added to the list.
>

Not "added" but "removed". Writing to a bad block should create valid 
content so they should be removed from the list. If they don't then 
indeed there is probably a bug in the MD code, see my previous post.

> I replaced sdm with a new disk. this was one that had a bunch or bad 
> blocks reported by md, and after finishing the rebuild ( with no 
> errors at all ) the --examine-badblocks still gives me the exact same 
> list of errors. I would expect that replacing the disk by a new one 
> would clear the errors.
>

This is the correct behaviour by design.
Source disks did not have valid content in those positions, so good data 
cannot be created from nothing. Badblocks will be replicated onto the 
new disk.
"Bad" here is more a synonym of "containing invalid data", not really 
"unreadable surface".

> as I know the disks are good, is there any way of reseting the bad 
> blocks list without destroying the filesystem?
>

This one I don't know but doing that would probably not help to find the 
bug.

Regads
EW

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 11:54   ` Pedro Teixeira
       [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
@ 2014-07-02 16:43     ` John Stoffel
       [not found]       ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
  2014-07-03  2:40     ` NeilBrown
  2 siblings, 1 reply; 19+ messages in thread
From: John Stoffel @ 2014-07-02 16:43 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: NeilBrown, linux-raid

>>>>> "Pedro" == Pedro Teixeira <finas@aeiou.pt> writes:

Pedro> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are  
Pedro> seagate sshd ST1000DX001.

Pedro> So I run the "dd if=/dev/md0 of=/dev/null  bs=4096" and it failed on  
Pedro> alot of places. I had to restart the command several times with the  
Pedro> skip parameter set to a couple of blocks after the last block error.  
Pedro> It run for about 1.5TB of the total 13TB of the volume.
Pedro> The md volume didn't drop any drive when running this.

Can you destroy the filesystem and re-create the RAID6 from scratch by
any chance?  Or can you maybe create a smaller array with only 6
devices to run some tests?  

Can you provide more details on your ext4 filesystem using tune2fs?
Have you tried using XFS instead?  Does the filesystem have a logfile
or not?  And does a full fsck run to completion?  

Have you checked all the cables?  Do you have RAID firmware on the LSI
card by any chance, or are they setup as JBOD?  Could you have a too
small a power supply so you're seeing corruption on the system due to
low voltage on one of the 5V or 12V rails?  Can you try powering half
the disks from another power supply as a test?  

Do you have a graphics card in the system?  If so, can you pull it and
run it headless, or maybe put in a less power hungry card?  

John
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>]

* Re: strange problem with raid6 read errors on active non-degraded array
       [not found]       ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
@ 2014-07-02 18:41         ` Pedro Teixeira
  2014-07-02 19:01         ` John Stoffel
  1 sibling, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 18:41 UTC (permalink / raw)
  To: John Stoffel, NeilBrown, linux-raid

  Hi John,

I can't destroy the fs at the moment.
  The problem is not filesystem related as md throws an error when  
reading with dd when the filesystem is not mounted.
The controler is flashed with the latest P19 firmware IT mode, meaning  
  that disks are "passed-though". No raid or jbod. Power supply has a  
singe 12v rail and total output of 800w. Graphics card is a pcie 1x  
nvidia card.
I have a very similar machine, that has the same case, the same power  
supply, the same LSI controller in the same mode with the same  
firmware, same OS, same kernel. Diferences are the motherboard Z87  
chipset and i7 cpu, and the hard disks are 16x  4TB seagate HDD's in  
raid6 created the exact same way as this one with mdadm 3.3. I have no  
problems with it.

Cheers
Pedro

________________________________________________________________________________
Mensagem enviada atravÃ©s do email grÃ¡tis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
       [not found]       ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
  2014-07-02 18:41         ` Pedro Teixeira
@ 2014-07-02 19:01         ` John Stoffel
  1 sibling, 0 replies; 19+ messages in thread
From: John Stoffel @ 2014-07-02 19:01 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: John Stoffel, NeilBrown, linux-raid

Pedro> I can't destroy the fs at the moment. The problem is not
Pedro> filesystem related as md throws an error when reading with dd
Pedro> when the filesystem is not mounted.

I hope you have backups of all this data, because I stongly suspect
you've run into either an MD coding problem, or you have the data
structures so confused that MD really needs to be re-built from
scratch.  

Pedro> The controler is flashed with the latest P19 firmware IT mode,
Pedro> meaning that disks are "passed-though". No raid or jbod.

JBOD means Just a Bunch Of Disks, which is what you have, so good.  

Pedro> Power supply has a singe 12v rail and total output of 800w.

Should be ok then.

Pedro> Graphics card is a pcie 1x nvidia card. I have a very similar
Pedro> machine, that has the same case, the same power supply, the
Pedro> same LSI controller in the same mode with the same firmware,
Pedro> same OS, same kernel. Diferences are the motherboard Z87
Pedro> chipset and i7 cpu, and the hard disks are 16x 4TB seagate
Pedro> HDD's in raid6 created the exact same way as this one. I have
Pedro> no problems with it.

Hmm... so how did the system crash and lose the disk(s) in the fist
place?  Did the cables get knocked?  Are they in a disk cage or hot
swap bays?  Why kinds of physical cabling are you using here?    

The suggestion to use blktrace to examine how IO flows into the MD
device and then down into the various devices is a good one, but I
don't have any good suggestions on what to do here.

But in any case, I'll repeat this now.  Backup your data, and
basically assume some of it is toast and needs to be restored or
re-created if at all possible.  With all the errors you're showing,
there's bound to be major filesystem corruption and even undetected
corruption in some files on there.  Not a good place to be.

Too bad you can't just copy the data off to the other machine with the
16 x 4Tb disks.  That would give you a good chance to save your data.

John

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-02 11:54   ` Pedro Teixeira
       [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
  2014-07-02 16:43     ` John Stoffel
@ 2014-07-03  2:40     ` NeilBrown
  2014-07-03  8:29       ` Pedro Teixeira
                         ` (2 more replies)
  2 siblings, 3 replies; 19+ messages in thread
From: NeilBrown @ 2014-07-03  2:40 UTC (permalink / raw)
  To: Pedro Teixeira; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3599 bytes --]

On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:

> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are  
> seagate sshd ST1000DX001.
> 
> So I run the "dd if=/dev/md0 of=/dev/null  bs=4096" and it failed on  
> alot of places. I had to restart the command several times with the  
> skip parameter set to a couple of blocks after the last block error.  
> It run for about 1.5TB of the total 13TB of the volume.
> The md volume didn't drop any drive when running this.
> 
> dmesg showed:
> 
> [ 1678.478156] Buffer I/O error on device md0, logical block 196012546

I love numbers, thanks.
The logical block size is 4096, or 8 sectors (1 sector is defined as 512
bytes), so this is at 
  196012546*8 == 1568100368 sectors into the array.

The array has a chunksize of 512K, or 1024 sectors so
 196012546*8/1024 = 1531348.015625

gives us the chunk number, and the remaining fraction of a chunk.

The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to
find where the above chunk is stored we divide by 14

   1531348/14 = 109382.0000

So that is chunk 109382 on the first device (though with rotating data,
it might not be the very first).

Add back in the factional part, multiple by 1024 sectors per chunk, and add
the Data Offset,

  109382.01562500*1024+262144 = 112269328

So it seems that sector 112269328 on some device is bad.

> The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>   
> raid.b" before and after running the "dd" command returned no changes:
> 

I didn't notice the fact that the bad block logs were not empty before, sorry.
Anyway:...
> 
> Bad-blocks on /dev/sdb:
>             112269328 for 512 sectors

Look at that - exactly the number I calculated.  I love it when that works
out.

So the problem is exactly that some blocks are thought by md to be bad.

Blocks get recorded as bad (for raid6) when:

 - a 'read' reported an error which could not be fixed, either
   because the array was degraded so the data could not be recovered,
   or because the attempt to write restored data failed
 - when recovering a spare, if the data to be written cannot be found (due to
   errors on other devices)
 - when a 'write' request to a device fails

When your array had three failed devices, some reads and writes would have
failed.  Maybe that caused the bad blocks to be recorded.
What sort of devices failures where they?  If the device became completely
inaccessible, then it would not have been possible to record the bad block
information.

Can you describe the sequence of events that lead to the three failures?
When you put the array back together, did you --create it, or --assemble
--force?

There isn't an easy way to remove the bad block list, as doing so is normally
asking for data corruption.
However it is probably justified in your case.
As it happens I included code in the kernel to make it possible to remove bad
blocks from the list - it was intended for testing only but I never removed
it.
If you run
  sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks | 
  while read; do
     echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
  done

then it should clear all of the bad blocks recorded  on sdq.
You should probably fail/remove the last two devices that you added to the
array before you do this, as they probably don't have properly uptodate
information and doing this will cause corruption.

I probably need to think about better ways to handle the bad block lists.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-03  2:40     ` NeilBrown
@ 2014-07-03  8:29       ` Pedro Teixeira
  2014-07-03 10:39       ` Pedro Teixeira
  2014-07-03 21:06       ` Pedro Teixeira
  2 siblings, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-03  8:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,

    Thanks for the very informative answer, that nailed it, and Ethan  
was obviously onto it too!

    I tried running the commands you posted and it gives me an error:

    bb.sh
    "
      sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks
      while read; do
         echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
      done
    "
    root@nas3:~# ./bb.sh
    -211996592 128
    ./bb.sh: line 3: echo: write error: Invalid argument
    "
    Can you help me with this?

    I will clear all the bad blocks on all the drives and force a  
repair and see if some error shows up.  If not, I will then fsck the  
filesystem.

    I'm not sure how the volume failed. On one friday morning ( past  
month ) I checked the system and everything was ok ( no dmesg errors  
and mdastat repoted all disks up ). next monday I got a call telling  
me that the volume was inacessible. When I got back the next thursday,  
the machine had already  been rebooted and the md0 volume had three  
failed disks. I did a --examine and two of them were completly off in  
terms of events regarding the non-failed disks. the other one was much  
more close, but still a bit off. Not close enough to do a --assemble  
--force, so I recreated the array with something like this:

"mdadm --create --assume-clean --level=6 --raid-devices=16  
--name=nas3:Datastore --uuid=9e97c588:59135324:c7d3fdf6:e543bdc3  
/dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdb /dev/sdf /dev/sdg  
/dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl missing /dev/sdn missing  
/dev/sdp /dev/sdq". I think the last failed drive was sdl or sdn,  
can't remember.


then I cleared the superblocks on the missing disks and readded them.  
Then I fsck'd the filesystem and I started getting those errors. I  
since then replaced them with new disks and tested the old ones only  
to find that they have no smart errors reported ( smart is enabled in  
bios ) and I also did a read-write test to them and I found them to be  
ok.

I will rebuild this machine next weekend, or the one after that, to  
try to sort out some hardware problem or issues with the cabling, but  
I am inclined to say that maybe it's related to the sshd's.


Cheers
Pedro


    Citando NeilBrown <neilb@suse.de>:
> On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote:
>> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are
>>    seagate sshd ST1000DX001.
>>
>>    So I run the "dd if=/dev/md0 of=/dev/null  bs=4096" and it failed on
>>    alot of places. I had to restart the command several times with the
>>    skip parameter set to a couple of blocks after the last block error.
>>    It run for about 1.5TB of the total 13TB of the volume.
>>    The md volume didn't drop any drive when running this.
>>
>>    dmesg showed:
>>
>>    [ 1678.478156] Buffer I/O error on device md0, logical block 196012546
>   I love numbers, thanks.
>   The logical block size is 4096, or 8 sectors (1 sector is defined as 512
>   bytes), so this is at
>   196012546*8 == 1568100368 sectors into the array.
>
>   The array has a chunksize of 512K, or 1024 sectors so
>   196012546*8/1024 = 1531348.015625
>
>   gives us the chunk number, and the remaining fraction of a chunk.
>
>   The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to
>   find where the above chunk is stored we divide by 14
>
>     1531348/14 = 109382.0000
>
>   So that is chunk 109382 on the first device (though with rotating data,
>   it might not be the very first).
>
>   Add back in the factional part, multiple by 1024 sectors per chunk, and add
>   the Data Offset,
>
>   109382.01562500*1024+262144 = 112269328
>
>   So it seems that sector 112269328 on some device is bad.
>> The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >>
>>    raid.b" before and after running the "dd" command returned no changes:
>   I didn't notice the fact that the bad block logs were not empty  
> before, sorry.
>   Anyway:...  > Bad-blocks on /dev/sdb:
>>                112269328 for 512 sectors
>   Look at that - exactly the number I calculated.  I love it when that works
>   out.
>
>   So the problem is exactly that some blocks are thought by md to be bad.
>
>
>   Blocks get recorded as bad (for raid6) when:
>
>   - a 'read' reported an error which could not be fixed, either
>     because the array was degraded so the data could not be recovered,
>     or because the attempt to write restored data failed
>   - when recovering a spare, if the data to be written cannot be  
> found (due to
>     errors on other devices)
>   - when a 'write' request to a device fails
>
>   When your array had three failed devices, some reads and writes would have
>   failed.  Maybe that caused the bad blocks to be recorded.
>   What sort of devices failures where they?  If the device became completely
>   inaccessible, then it would not have been possible to record the bad block
>   information.
>
>   Can you describe the sequence of events that lead to the three failures?
>   When you put the array back together, did you --create it, or --assemble
>   --force?
>
>   There isn't an easy way to remove the bad block list, as doing so  
> is normally
>   asking for data corruption.
>   However it is probably justified in your case.
>   As it happens I included code in the kernel to make it possible to  
> remove bad
>   blocks from the list - it was intended for testing only but I never removed
>   it.
>   If you run
>   sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks |
>   while read; do
>       echo $a > /sys/block/md0/md/dev-sdq/bad_blocks
>   done
>
>   then it should clear all of the bad blocks recorded  on sdq.
>   You should probably fail/remove the last two devices that you added to the
>   array before you do this, as they probably don't have properly uptodate
>   information and doing this will cause corruption.
>
>   I probably need to think about better ways to handle the bad block lists.
>   NeilBrown

________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-03  2:40     ` NeilBrown
  2014-07-03  8:29       ` Pedro Teixeira
@ 2014-07-03 10:39       ` Pedro Teixeira
  2014-07-03 21:06       ` Pedro Teixeira
  2 siblings, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-03 10:39 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

I ended up understanding the command but if I run it manually it  
doesn't work. bad_block is cleared but the --examine-badblocks stills  
shows it. and after stopping/assembling the md volume the bad block  
shows up again.

root@nas3:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@nas3:~# mdadm --assemble /dev/md0
mdadm: failed to get exclusive lock on mapfile - continue anyway...
mdadm: /dev/md0 has been started with 16 drives.
root@nas3:~# cat /sys/block/md0/md/dev-sdq/bad_blocks
211996592 128
root@nas3:~# echo "-211996592 128" > /sys/block/md0/md/dev-sdq/bad_blocks
root@nas3:~# cat /sys/block/md0/md/dev-sdq/bad_blocks
root@nas3:~# mdadm --examine-badblocks /dev/sdq
Bad-blocks on /dev/sdq:
            211996592 for 128 sectors
root@nas3:~#


so "cat /sys/block/md0/md/dev-sdq/bad_blocks" shows now bad blocks,  
but the --examine-badblocks still lists it.



________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: strange problem with raid6 read errors on active non-degraded array
  2014-07-03  2:40     ` NeilBrown
  2014-07-03  8:29       ` Pedro Teixeira
  2014-07-03 10:39       ` Pedro Teixeira
@ 2014-07-03 21:06       ` Pedro Teixeira
  2 siblings, 0 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-03 21:06 UTC (permalink / raw)
  To: linux-raid

I was able to fix the volume and the filesystem!

  - the command Neil posted didn't work but I got the idea and made a  
script that cleared the list for all disks. The --examine-bad-blocks  
still lists the bad blocks, and stopping and assembling the volume  
again will populate the bad block list again. Still, I cleared them  
all again and issued a "repair" on the volume. I got a bunch of errors  
from a couple of disks, mostly sdk and sdb but the volume synced till  
the end, and after stopping it and assembling it again, no bad blocks  
in any disk, and --examine-bad-blocks also showed no bad blocks. I  
have since replaced sdk and sdb, with no errors when syncing and no  
errors on dmesg. After that I fsck'd the filesystem, and it's up and  
running again. I will now replace the other two disks that exibited  
read errors when repairing the volume as soon as I get some  
replacements.

Thanks all for the help!!!

As a sugestion, I would make md distinguish a read error that is  
caused by no good strip available due to bad block list from other  
read errors to ease troubleshooting and maybe implement a way to clear  
bad block list from disks with mdadm ( and maybe forcing a resync of  
that strip after the list is cleared ).


Cheers
Pedro

________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-07-03 21:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02  9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
2014-07-02  9:52 ` Roman Mamedov
2014-07-02 10:07   ` Pedro Teixeira
2014-07-02 10:11     ` Roman Mamedov
2014-07-02 10:37       ` Pedro Teixeira
2014-07-02 11:03       ` Pedro Teixeira
2014-07-02 10:45 ` NeilBrown
2014-07-02 11:54   ` Pedro Teixeira
     [not found]     ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
2014-07-02 14:14       ` Pedro Teixeira
2014-07-02 14:55         ` Lars Täuber
2014-07-02 16:35         ` Ethan Wilson
     [not found]           ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
2014-07-02 21:34             ` Ethan Wilson
2014-07-02 16:43     ` John Stoffel
     [not found]       ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-02 18:41         ` Pedro Teixeira
2014-07-02 19:01         ` John Stoffel
2014-07-03  2:40     ` NeilBrown
2014-07-03  8:29       ` Pedro Teixeira
2014-07-03 10:39       ` Pedro Teixeira
2014-07-03 21:06       ` Pedro Teixeira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox