* strange problem with raid6 read errors on active non-degraded array
@ 2014-07-02 9:32 Pedro Teixeira
2014-07-02 9:52 ` Roman Mamedov
2014-07-02 10:45 ` NeilBrown
0 siblings, 2 replies; 19+ messages in thread
From: Pedro Teixeira @ 2014-07-02 9:32 UTC (permalink / raw)
To: linux-raid
- I'm having the following problem on a raid6 md volume consisting og
16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
- every time I run a fsck.ext4 I will get the exact same errors (
...short read ). Forcing a repair on the md0 volume shows no errors
and completes without problems. All disks are active and the volume is
not degraded, still I can't get rid of the short errors on those 16
blocks and when the filesystem is mounted the read errors will come up
from time to time as they are probably in use.
- If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt
seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file
but the file doesn't appear to have nothing on it ( and the file
doesn't take the 1.8T on disk as the disk is much smaller )
- this started happening after having a three disk failure. I
recovered from that failure by recreating the array with the
non-failed 13 disks plus the last failed one ( events didn't differ
much ). I then readed the other disks. The failed disks are all
physically good, tested them with hdat2 and they don't have read/write
errors so I reused them. I don't know why they failed, maybe some
incompatibility with SSHD's and the LSI HBA controller..
root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096
6+0 records in
6+0 records out
24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
root@nas3:/# ls -lah teste.txt
-rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt
root@nas3:/#
root@nas3:/# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]
sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2
[16/16] [UUUUUUUUUUUUUUUU]
- When doing a fsck.ext4 of /dev/md0 it returns the following ( and I
can do it over and over again with the exact same errors) :
root@nas3:/# fsck.ext4 -f /dev/md0
e2fsck 1.42.10 (18-May-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error reading block 458227712 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227713 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227714 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227715 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227716 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227717 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227718 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227719 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227720 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227721 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227722 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227723 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227724 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227725 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227726 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Error reading block 458227727 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yes
Force rewrite<y>? yes
Block bitmap differences: +(458227712--458231839)
+(458234642--458235681) +(458244447--458245519)
+(458246454--458247229) +(458248461--458248750) +458250468 +458251108
+(458261280--458261284) +(458263296--458263297) +458263312 +458263328
+(458265376--458265379) +(458267392--458267394)
+(458269440--458269441) +458269456 +(458269472--458269474)
+(458271520--458271543) +(458273536--458273547)
+(458275584--458275585) +458275600 +458275616 +(458277664--458277669)
+(458279680--458279682) +(458281728--458281729)
+(458283776--458284059) +458285824 +458285837 +458285840
+(458285856--458285857) +(458287904--458287907)
+(458289920--458289922) +458291968 +458291984 +458292000
+(458294048--458294054) +(458296064--458296065) +458296080
+(458296096--458296116) +(458298144--458298169)
+(458300160--458300504) +(458302208--458302209) +458302224 +458302240
+(458304288--458304298) +(458306304--458307950)
+(458310400--458310401) +458310416 +458310432 +(458312480--458312483)
+458314496 +458316544 +458316550 +458317824 +(458321152--458321950)
+458321952 +458321954 +458321956 +458321958 +458321965 +458321981
+(458323981--458323986) +(458327296--458327297)
+(458328094--458328097) +(458331392--458331393)
+(458333440--458333441) +(458335488--458335489)
+(458337536--458337537) +458339584 +458339593 +458339595 +458339600
+458339616 +458341616 +(458343680--458343681) +(458345728--458345729)
+458347776 +458347792 +458347808 +458349808 +(458351872--458351874)
+458351888 +458351904 +458353904 +(458355968--458355969)
+(458356765--458356815) +(458359809--458360062) +458360064 +458360080
+458360096 +(458360113--458360120) +458362096 +(458364160--458364161)
+458364176 +458364192 +458366192 +(458368256--458368257) +458370304
+458370307 +(458373115--458373116) +458373119 +(458373127--458373160)
+(458375168--458379263) +(458379271--458379304)
+(458381319--458381352) +(458383360--458432511)
+(458433367--458433686) +(458434560--458514535)
+(458516480--458516488) +(458516496--458561535)
+(458561680--458565631) +(458565648--458574328)
+(458574416--458575982) +(458576912--458577167)
+(458577680--458579535) +(458579968--458582015)
+(458594304--458594585) +(458594632--458595592)
+(458595627--458595725) +(458595728--458596527)
+(458596545--458596687) +(458597423--458598607)
+(458598990--458602495) +(458602922--458603023)
+(458604256--458604623) +(458605072--458605135)
+(458605520--458605717) +(458605908--458608536)
+(458608642--458609662) +(458609680--458610704)
+(458610776--458613449) +(458613519--458615179)
+(458616265--458616831) +(458617702--458618383)
+(458618512--458619007) +(458619088--458619151)
+(458619896--458621625) +(458621648--458622175)
+(458622224--458622489) +(458622508--458622830)
+(458622848--458623129) +(458623162--458623345)
+(458623394--458623953) +(458623962--458624460)
+(458624896--458624975) +(458624986--458626127)
+(458626282--458627727) +(458627920--458629119)
+(458629195--458632207) +(458632695--458632841)
+(458633168--458633231) +(458633668--458633923)
+(458634370--458634621) +(458634646--458634660)
+(458634704--458635306) +(458635344--458636303)
+(458636734--458637311) +(458638356--458639359)
+(458639440--458640109) +(458640195--458645071)
+(458645178--458645503) +(458645776--458645922)
+(458646009--458646479) +(458646546--458647589)
+(458647696--458648655) +(458649040--458649807)
+(458650640--458651663) +(458652432--458653695)
+(458657064--458657199) +(458657792--458658625)
+(458658628--458658631) +(458658640--458659231)
+(458659513--458659748) +(458659792--458659882)
+(458660432--458661337) +(458661899--458663417)
+(458663760--458664083) +(458665232--458665295)
+(458665552--458665706) +(458665808--458668031)
+(458668240--458668855) +(458669126--458669127)
+(458669419--458670079) +(458674183--458674216) +458675464
+(458676231--458676267) +(458676360--458676370)
+(458676488--458676498) +458676616 +(458676744--458676754)
+(458676872--458676873) +458677000 +458677128 +(458677256--458677257)
+458677384 +458677512 +(458677640--458678410) +458678536 +458678664
+458678666 +(458678792--458678794) +458678920 +(458679048--458679049)
+458679306 +(458679688--458679770) +(458680327--458680360)
+(458681736--458681781) +(458682375--458682408)
+(458683784--458685154) +(458685192--458685193)
+(458685832--458685882) +(458686471--458686507)
+(458686600--458686604) +(458687112--458687115) +458687240 +458687368
+(458687880--458688062) +(458688264--458688265)
+(458688519--458688552) +(458689928--458690083)
+(458690567--458690602) +458690978 +(458691976--458693464)
+(458693510--458693514) +458693638 +(458693766--458693769) +458693894
+(458694024--458694652) +(458694663--458694696)
+(458696072--458705014) +458705160 +458705288 +(458705416--458705473)
+(458706312--458706320) +(458706951--458706984)
+(458708999--458709032) +(458711047--458711080)
+(458713095--458713128) +(458715143--458715176)
+(458717191--458717224) +(458719239--458719272) +458720616
+(458721287--458721320) +(458721416--458721421) +458721544 +458722056
+(458722184--458722187) +(458722696--458723254)
+(458723335--458723368) +458723976 +(458724360--458724361)
+(458725383--458725416) +(458725896--458725965)
+(458727431--458727464) +(458727942--458728837)
+(458729479--458729512) +(458731527--458731560)
+(458733575--458733703) +(458734984--458739136)
+(458739719--458739752) +(458741767--458741800)
+(458743815--458743848) +(458745863--458745896)
+(458747911--458747944) +(458749959--458749992) +(458751368--458751999)
Fix<y>? yes
/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 9057/427278336 files (7.2% non-contiguous),
1157126209/3418209536 blocks
dmesg ( while doing the fsck.txt4 ) shows:
[84019.232630] Buffer I/O error on device md0, logical block 458227712
[84019.232715] Buffer I/O error on device md0, logical block 458227712
[84024.149583] Buffer I/O error on device md0, logical block 458227713
[84024.149679] Buffer I/O error on device md0, logical block 458227713
[84025.073526] Buffer I/O error on device md0, logical block 458227714
[84025.073617] Buffer I/O error on device md0, logical block 458227715
[84025.073688] Buffer I/O error on device md0, logical block 458227716
[84025.073765] Buffer I/O error on device md0, logical block 458227714
[84026.571139] Buffer I/O error on device md0, logical block 458227715
[84027.654387] Buffer I/O error on device md0, logical block 458227717
[84027.654474] Buffer I/O error on device md0, logical block 458227718
[84027.654549] Buffer I/O error on device md0, logical block 458227719
[84027.654617] Buffer I/O error on device md0, logical block 458227720
[84027.654684] Buffer I/O error on device md0, logical block 458227721
[84030.577188] quiet_error: 8 callbacks suppressed
[84030.577190] Buffer I/O error on device md0, logical block 458227720
[84031.233856] Buffer I/O error on device md0, logical block 458227721
[84031.907058] Buffer I/O error on device md0, logical block 458227722
[84032.534278] Buffer I/O error on device md0, logical block 458227723
[84033.186672] Buffer I/O error on device md0, logical block 458227724
[84033.847581] Buffer I/O error on device md0, logical block 458227725
[84034.453947] Buffer I/O error on device md0, logical block 458227726
[84035.073116] Buffer I/O error on device md0, logical block 458227727
[84068.605347] Buffer I/O error on device md0, logical block 458227712
[84068.605427] lost page write due to I/O error on md0
[84068.605439] Buffer I/O error on device md0, logical block 458227713
[84068.605519] lost page write due to I/O error on md0
[84068.605528] Buffer I/O error on device md0, logical block 458227714
[84068.605747] lost page write due to I/O error on md0
[84068.605757] Buffer I/O error on device md0, logical block 458227715
[84068.605828] lost page write due to I/O error on md0
[84068.605837] Buffer I/O error on device md0, logical block 458227716
[84068.605910] lost page write due to I/O error on md0
[84068.605919] Buffer I/O error on device md0, logical block 458227717
[84068.605995] lost page write due to I/O error on md0
[84068.606048] Buffer I/O error on device md0, logical block 458227718
[84068.606217] lost page write due to I/O error on md0
[84068.606227] Buffer I/O error on device md0, logical block 458227719
[84068.606295] lost page write due to I/O error on md0
[84068.606327] Buffer I/O error on device md0, logical block 458227720
[84068.606398] lost page write due to I/O error on md0
[84068.606407] Buffer I/O error on device md0, logical block 458227721
[84068.606471] lost page write due to I/O error on md0
Doing a resync brings no errors and finishes without problem:
[24406.670968] md: requested-resync of RAID array md0
[24406.670971] md: minimum _guaranteed_ speed: 1410065407 KB/sec/disk.
[24406.670973] md: using maximum available idle IO bandwidth (but not
more than 1410065407 KB/sec) for requested-resync.
[24406.670981] md: using 128k window, over a total of 976631296k.
[33488.135225] md: md0: requested-resync done.
- doing:
root@nas3:/# debugfs /dev/md0
debugfs 1.42.10 (18-May-2014)
/dev/md0: Can't read a block bitmap while reading block bitmap
debugfs:
- brings the same kind of errors to dmesg.
- filesystem mounts and unmounts fine:
root@nas3:/# mount /dev/md0 /mnt
root@nas3:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 106G 5.3G 95G 6% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 3.9G 196K 3.9G 1% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 0 3.9G 0% /tmp
/dev/md0 13T 4.3T 8.5T 34% /mnt
[84215.958792] EXT4-fs (md0): mounted filesystem with ordered data
mode. Opts: (null)
root@nas3:/# umount /mnt
mdadm --examine /dev/sd[bcdefghijklmnopqr] >> raid.status
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : b56fa722:c5be1eda:5b3e89cc:7199d266
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : e8a1ec1f - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : e72b076e:42886d45:8978e63b:b70c3c1b
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : c3171f37 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : a195ff09:a794b5fc:7c830670:bcf450f1
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 208c8851 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 6ab2bcfc:872649a6:a053e0fe:94fe1fc3
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 1d8610fd - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdf:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : f4612be4:5e8b4db0:4e23f28d:e37d27b6
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 9112745e - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdg:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : e595d71c:c45d6fda:24a49338:2615328b
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 738c92c6 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdh:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 347fa638:4193adb2:4b8616d4:058fff18
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 90ea0da1 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdi:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 2f6ab7cb:3957ffa0:8b2decd2:b133cb5a
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 52ee087a - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdj:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : cd1cbc05:552bedbd:bf8f7be8:960afcd1
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 36a0c84e - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdk:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 4e352f48:398c4529:b39cd8c8:d5a14e7e
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 711be5ee - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdl:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 01e6c661:a4d8c466:84fd830c:dc3ec346
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : d452e0ec - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 10
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdm:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : aa22b86a:fb4effe6:8028a5ae:df01a2c2
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 7b7e81eb - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 11
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdn:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 8e0f1a50:50538cf7:c7553f75:22af1e8a
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : ff844db0 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 12
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdo:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : ea496b92:ac96fabc:23b5026a:30b0b80f
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 81a12bd0 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 13
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdp:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 01173faa:f45adebc:9a1dc160:306641a2
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 229fdb9c - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 14
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
/dev/sdq:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x8
Array UUID : 9e97c588:59135324:c7d3fdf6:e543bdc3
Name : nas3:Datastore (local to host nas3)
Creation Time : Tue May 27 12:18:06 2014
Raid Level : raid6
Raid Devices : 16
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 13672838144 (13039.43 GiB 14000.99 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : a7a6c77f:88c5d5d7:c330ab03:6cf98a83
Update Time : Wed Jul 2 10:03:48 2014
Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
Checksum : 97537c43 - correct
Events : 1128363
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 15
Array State : AAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R'
== replacing)
________________________________________________________________________________
Mensagem enviada através do email grátis AEIOU
http://www.aeiou.pt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira @ 2014-07-02 9:52 ` Roman Mamedov 2014-07-02 10:07 ` Pedro Teixeira 2014-07-02 10:45 ` NeilBrown 1 sibling, 1 reply; 19+ messages in thread From: Roman Mamedov @ 2014-07-02 9:52 UTC (permalink / raw) To: Pedro Teixeira; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1531 bytes --] On Wed, 02 Jul 2014 10:32:41 +0100 Pedro Teixeira <finas@aeiou.pt> wrote: > - I'm having the following problem on a raid6 md volume consisting og > 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3. > > - every time I run a fsck.ext4 I will get the exact same errors ( > ...short read ). Forcing a repair on the md0 volume shows no errors > and completes without problems. All disks are active and the volume is > not degraded, still I can't get rid of the short errors on those 16 > blocks and when the filesystem is mounted the read errors will come up > from time to time as they are probably in use. Are you sure that Ext4 in your kernel, and all tools that you use with it (such as the fsck) really support 16 TB filesystems? I recall there have been some semi-obvious problems with that. Try a different FS, e.g. XFS or Btrfs instead of Ext4. > - If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt > seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file > but the file doesn't appear to have nothing on it ( and the file > doesn't take the 1.8T on disk as the disk is much smaller ) > root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096 > 6+0 records in > 6+0 records out > 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s > root@nas3:/# ls -lah teste.txt > -rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt Here you need to use skip=, not seek=. See "man dd". -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 9:52 ` Roman Mamedov @ 2014-07-02 10:07 ` Pedro Teixeira 2014-07-02 10:11 ` Roman Mamedov 0 siblings, 1 reply; 19+ messages in thread From: Pedro Teixeira @ 2014-07-02 10:07 UTC (permalink / raw) To: Roman Mamedov; +Cc: linux-raid Hi Roman, Thanks for the reply and the correction on the "dd" command. - ext4 is in the kernel as the fs wouldn't mount otherwise and the tools are the latest ones ( e2fsprogs 1.42.10 ) root@nas3:/# fsck.ext4 -V e2fsck 1.42.10 (18-May-2014) Using EXT2FS Library version 1.42.10, 18-May-2014 - Doing the correct "dd" command ( dd if=/dev/md0 of=teste.txt skip=458227712 count=16 bs=4096 ) will net the same dmesg errors and a 0 bytes file. dd: reading `/dev/md0': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000268007 s, 0.0 kB/s [88623.524481] Buffer I/O error on device md0, logical block 458227712 - I'm sure this is not a filesystem problem, but something fishy with dm. As all disks are active and synced if one would have bad sectors dm should read the sector from another one, but aparently it is not doing that. Cheers Pedro ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 10:07 ` Pedro Teixeira @ 2014-07-02 10:11 ` Roman Mamedov 2014-07-02 10:37 ` Pedro Teixeira 2014-07-02 11:03 ` Pedro Teixeira 0 siblings, 2 replies; 19+ messages in thread From: Roman Mamedov @ 2014-07-02 10:11 UTC (permalink / raw) To: Pedro Teixeira; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 291 bytes --] On Wed, 02 Jul 2014 11:07:13 +0100 Pedro Teixeira <finas@aeiou.pt> wrote: > [88623.524481] Buffer I/O error on device md0, logical block 458227712 Ah sorry I have missed these messages quoted in the original mail. Then of course, it is not an FS issue. -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 10:11 ` Roman Mamedov @ 2014-07-02 10:37 ` Pedro Teixeira 2014-07-02 11:03 ` Pedro Teixeira 1 sibling, 0 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-02 10:37 UTC (permalink / raw) To: linux-raid I also did a mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >> raid.b and none of the bad blocks present on the disks are on the range of the ones that are giving out the read error. Also, is there a way to clear the badblocks list without destroying the filesystem? ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 10:11 ` Roman Mamedov 2014-07-02 10:37 ` Pedro Teixeira @ 2014-07-02 11:03 ` Pedro Teixeira 1 sibling, 0 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-02 11:03 UTC (permalink / raw) To: linux-raid Hi Neil, " Can't possible happen! (Do worry, I say that a lot - I'm usually wrong). " :) - I'll simply do a dd if=/dev/md0 off=/dev/null and see what errors show up. I will report back when if finishes. - Debian squeeze x64, with custom 3.15.3 kernel and mdadm 3.3. The md volume was created with mdadm 3.3 and kernel 3.13 or 3.14 I think. Cheers Pedro ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira 2014-07-02 9:52 ` Roman Mamedov @ 2014-07-02 10:45 ` NeilBrown 2014-07-02 11:54 ` Pedro Teixeira 1 sibling, 1 reply; 19+ messages in thread From: NeilBrown @ 2014-07-02 10:45 UTC (permalink / raw) To: Pedro Teixeira; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2948 bytes --] On Wed, 02 Jul 2014 10:32:41 +0100 Pedro Teixeira <finas@aeiou.pt> wrote: > - I'm having the following problem on a raid6 md volume consisting og > 16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3. > > - every time I run a fsck.ext4 I will get the exact same errors ( > ...short read ). Forcing a repair on the md0 volume shows no errors > and completes without problems. All disks are active and the volume is > not degraded, still I can't get rid of the short errors on those 16 > blocks and when the filesystem is mounted the read errors will come up > from time to time as they are probably in use. > > - If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt > seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file > but the file doesn't appear to have nothing on it ( and the file > doesn't take the 1.8T on disk as the disk is much smaller ) > > - this started happening after having a three disk failure. I > recovered from that failure by recreating the array with the > non-failed 13 disks plus the last failed one ( events didn't differ > much ). I then readed the other disks. The failed disks are all > physically good, tested them with hdat2 and they don't have read/write > errors so I reused them. I don't know why they failed, maybe some > incompatibility with SSHD's and the LSI HBA controller.. > > root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096 > 6+0 records in > 6+0 records out > 24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s > root@nas3:/# ls -lah teste.txt > -rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt > root@nas3:/# > > > > root@nas3:/# cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16] > sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1] > 13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [16/16] [UUUUUUUUUUUUUUUU] > > - When doing a fsck.ext4 of /dev/md0 it returns the following ( and I > can do it over and over again with the exact same errors) : > > root@nas3:/# fsck.ext4 -f /dev/md0 > e2fsck 1.42.10 (18-May-2014) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Error reading block 458227712 (Attempt to read block from filesystem > resulted in short read) while reading inode and block bitmaps. Ignore > error<y>? yes Can't possible happen! (Do worry, I say that a lot - I'm usually wrong). What sort of computer? Particularly is it 32bit or 64bit? Try using 'dd' to read a few meg at various offsets (1G, 2G, 4G, 6G, 8G, ....) and find out if there is a pattern, where it can read and where it cannot. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 10:45 ` NeilBrown @ 2014-07-02 11:54 ` Pedro Teixeira [not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de> ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-02 11:54 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are seagate sshd ST1000DX001. So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on alot of places. I had to restart the command several times with the skip parameter set to a couple of blocks after the last block error. It run for about 1.5TB of the total 13TB of the volume. The md volume didn't drop any drive when running this. dmesg showed: [ 1678.478156] Buffer I/O error on device md0, logical block 196012546 [ 1678.478314] Buffer I/O error on device md0, logical block 196012547 [ 1678.478462] Buffer I/O error on device md0, logical block 196012548 [ 1678.478737] Buffer I/O error on device md0, logical block 196012549 [ 1678.479077] Buffer I/O error on device md0, logical block 196012550 [ 1678.479415] Buffer I/O error on device md0, logical block 196012551 [ 1678.479754] Buffer I/O error on device md0, logical block 196012552 [ 1678.480082] Buffer I/O error on device md0, logical block 196012553 [ 1678.480679] Buffer I/O error on device md0, logical block 196012630 [ 1678.480811] Buffer I/O error on device md0, logical block 196012758 [ 2305.139382] quiet_error: 369 callbacks suppressed [ 2305.139385] Buffer I/O error on device md0, logical block 196012759 [ 2310.592687] Buffer I/O error on device md0, logical block 196012760 [ 2313.135470] Buffer I/O error on device md0, logical block 196012761 [ 2315.971196] Buffer I/O error on device md0, logical block 196012762 [ 2319.013647] Buffer I/O error on device md0, logical block 196012763 [ 2321.125008] Buffer I/O error on device md0, logical block 196012764 [ 2323.774654] Buffer I/O error on device md0, logical block 196012765 [ 2327.439527] Buffer I/O error on device md0, logical block 196012766 [ 2329.399068] Buffer I/O error on device md0, logical block 196012767 [ 2331.389823] Buffer I/O error on device md0, logical block 196012768 [ 2334.166786] Buffer I/O error on device md0, logical block 196012769 [ 2337.817145] Buffer I/O error on device md0, logical block 196012770 [ 2340.713005] Buffer I/O error on device md0, logical block 196012771 [ 2342.594948] Buffer I/O error on device md0, logical block 196012772 [ 2344.678599] Buffer I/O error on device md0, logical block 196012773 [ 2347.150423] Buffer I/O error on device md0, logical block 196012774 [ 2349.433777] Buffer I/O error on device md0, logical block 196012775 [ 2351.559728] Buffer I/O error on device md0, logical block 196012776 [ 2353.650886] Buffer I/O error on device md0, logical block 196012777 [ 2385.719365] Buffer I/O error on device md0, logical block 196012778 [ 2388.937566] Buffer I/O error on device md0, logical block 196012779 [ 2391.831046] Buffer I/O error on device md0, logical block 196012780 [ 2393.971170] Buffer I/O error on device md0, logical block 196012781 [ 2396.118172] Buffer I/O error on device md0, logical block 196012782 [ 2399.717491] Buffer I/O error on device md0, logical block 196012783 [ 2401.913373] Buffer I/O error on device md0, logical block 196012784 [ 2403.892253] Buffer I/O error on device md0, logical block 196012785 [ 2405.796383] Buffer I/O error on device md0, logical block 196012786 [ 2408.171017] Buffer I/O error on device md0, logical block 196012787 [ 2410.233107] Buffer I/O error on device md0, logical block 196012788 [ 2413.184341] Buffer I/O error on device md0, logical block 196012789 [ 2416.396825] Buffer I/O error on device md0, logical block 196012790 [ 2420.734772] Buffer I/O error on device md0, logical block 196012890 [ 2426.320297] Buffer I/O error on device md0, logical block 196013570 [ 2426.320397] Buffer I/O error on device md0, logical block 196013571 [ 2426.320504] Buffer I/O error on device md0, logical block 196013572 [ 2426.320595] Buffer I/O error on device md0, logical block 196013573 [ 2426.320686] Buffer I/O error on device md0, logical block 196013574 [ 2426.320778] Buffer I/O error on device md0, logical block 196013575 [ 2426.320877] Buffer I/O error on device md0, logical block 196013576 [ 2426.321024] Buffer I/O error on device md0, logical block 196013577 [ 2426.321193] Buffer I/O error on device md0, logical block 196013578 [ 2436.240507] quiet_error: 119 callbacks suppressed [ 2436.240509] Buffer I/O error on device md0, logical block 196012900 [ 2440.078873] Buffer I/O error on device md0, logical block 196012910 [ 2442.323624] Buffer I/O error on device md0, logical block 196012920 [ 2445.852897] Buffer I/O error on device md0, logical block 196013570 [ 2454.009848] Buffer I/O error on device md0, logical block 196013570 [ 2456.810436] Buffer I/O error on device md0, logical block 196013570 [ 2461.672818] Buffer I/O error on device md0, logical block 196014336 [ 2461.672901] Buffer I/O error on device md0, logical block 196014464 [ 2461.672985] Buffer I/O error on device md0, logical block 196014337 [ 2461.673109] Buffer I/O error on device md0, logical block 196014465 [ 2461.695280] Buffer I/O error on device md0, logical block 196014592 [ 2461.695371] Buffer I/O error on device md0, logical block 196014720 [ 2461.695458] Buffer I/O error on device md0, logical block 196014593 [ 2461.695548] Buffer I/O error on device md0, logical block 196014721 [ 2461.695633] Buffer I/O error on device md0, logical block 196014336 [ 2465.937036] Buffer I/O error on device md0, logical block 196125442 [ 2538.797979] quiet_error: 252 callbacks suppressed [ 2538.797982] Buffer I/O error on device md0, logical block 217780096 [ 2538.798084] Buffer I/O error on device md0, logical block 217780097 [ 2538.798163] Buffer I/O error on device md0, logical block 217780098 [ 2538.798240] Buffer I/O error on device md0, logical block 217780099 [ 2538.798321] Buffer I/O error on device md0, logical block 217780100 [ 2538.798404] Buffer I/O error on device md0, logical block 217780101 [ 2538.798486] Buffer I/O error on device md0, logical block 217780102 [ 2538.798569] Buffer I/O error on device md0, logical block 217780103 [ 2538.798681] Buffer I/O error on device md0, logical block 217780104 [ 2538.798812] Buffer I/O error on device md0, logical block 217780105 [ 2582.229715] quiet_error: 607 callbacks suppressed [ 2582.229717] Buffer I/O error on device md0, logical block 217780106 [ 2584.667289] Buffer I/O error on device md0, logical block 217780107 [ 2590.211304] Buffer I/O error on device md0, logical block 228358304 [ 2590.211388] Buffer I/O error on device md0, logical block 228358432 [ 2590.211467] Buffer I/O error on device md0, logical block 228358560 [ 2590.211555] Buffer I/O error on device md0, logical block 228358305 [ 2590.211628] Buffer I/O error on device md0, logical block 228358433 [ 2590.211712] Buffer I/O error on device md0, logical block 228358561 [ 2590.211792] Buffer I/O error on device md0, logical block 228358306 [ 2590.211871] Buffer I/O error on device md0, logical block 228358434 [ 2590.211945] Buffer I/O error on device md0, logical block 228358562 [ 2590.212025] Buffer I/O error on device md0, logical block 228358307 [ 2652.455446] quiet_error: 375 callbacks suppressed [ 2652.455449] Buffer I/O error on device md0, logical block 260370751 [ 2652.455541] Buffer I/O error on device md0, logical block 260370752 [ 2652.455618] Buffer I/O error on device md0, logical block 260370753 [ 2652.455694] Buffer I/O error on device md0, logical block 260370754 [ 2652.455779] Buffer I/O error on device md0, logical block 260370755 [ 2652.455853] Buffer I/O error on device md0, logical block 260370756 [ 2652.455930] Buffer I/O error on device md0, logical block 260370757 [ 2652.456003] Buffer I/O error on device md0, logical block 260370758 [ 2652.456090] Buffer I/O error on device md0, logical block 260370759 [ 2652.456166] Buffer I/O error on device md0, logical block 260370760 [ 2695.663954] quiet_error: 56 callbacks suppressed [ 2695.663957] Buffer I/O error on device md0, logical block 262508480 [ 2695.664039] Buffer I/O error on device md0, logical block 262508608 [ 2695.664113] Buffer I/O error on device md0, logical block 262508736 [ 2695.664197] Buffer I/O error on device md0, logical block 262508481 [ 2695.664264] Buffer I/O error on device md0, logical block 262508609 [ 2695.664344] Buffer I/O error on device md0, logical block 262508737 [ 2695.664417] Buffer I/O error on device md0, logical block 262508482 [ 2695.664489] Buffer I/O error on device md0, logical block 262508610 [ 2695.664557] Buffer I/O error on device md0, logical block 262508738 [ 2695.664632] Buffer I/O error on device md0, logical block 262508483 [ 2980.623591] quiet_error: 312 callbacks suppressed [ 2980.623595] Buffer I/O error on device md0, logical block 370515910 [ 2980.623676] Buffer I/O error on device md0, logical block 370516038 [ 2980.623761] Buffer I/O error on device md0, logical block 370515911 [ 2980.623828] Buffer I/O error on device md0, logical block 370516039 [ 2980.623903] Buffer I/O error on device md0, logical block 370515912 [ 2980.623970] Buffer I/O error on device md0, logical block 370516040 [ 2980.624046] Buffer I/O error on device md0, logical block 370515913 [ 2980.624119] Buffer I/O error on device md0, logical block 370516041 [ 2980.624191] Buffer I/O error on device md0, logical block 370515914 [ 2980.624262] Buffer I/O error on device md0, logical block 370516042 [ 3005.209442] quiet_error: 281 callbacks suppressed [ 3005.209444] Buffer I/O error on device md0, logical block 370516043 [ 3010.575774] Buffer I/O error on device md0, logical block 372582176 [ 3010.575854] Buffer I/O error on device md0, logical block 372582304 [ 3010.575927] Buffer I/O error on device md0, logical block 372582432 [ 3010.576004] Buffer I/O error on device md0, logical block 372582177 [ 3010.576082] Buffer I/O error on device md0, logical block 372582305 [ 3010.576147] Buffer I/O error on device md0, logical block 372582433 [ 3010.576232] Buffer I/O error on device md0, logical block 372582178 [ 3010.576298] Buffer I/O error on device md0, logical block 372582306 [ 3010.576361] Buffer I/O error on device md0, logical block 372582434 [ 3024.205000] quiet_error: 472 callbacks suppressed [ 3024.205003] Buffer I/O error on device md0, logical block 375180000 [ 3024.205082] Buffer I/O error on device md0, logical block 375180128 [ 3024.205154] Buffer I/O error on device md0, logical block 375180256 [ 3024.205229] Buffer I/O error on device md0, logical block 375180001 [ 3024.205308] Buffer I/O error on device md0, logical block 375180129 [ 3024.205374] Buffer I/O error on device md0, logical block 375180257 [ 3024.205441] Buffer I/O error on device md0, logical block 375180002 [ 3024.205509] Buffer I/O error on device md0, logical block 375180130 [ 3024.205581] Buffer I/O error on device md0, logical block 375180258 [ 3024.205655] Buffer I/O error on device md0, logical block 375180003 [ 3182.726623] quiet_error: 183 callbacks suppressed [ 3182.726626] Buffer I/O error on device md0, logical block 434495873 [ 3182.726708] Buffer I/O error on device md0, logical block 434495874 [ 3182.726787] Buffer I/O error on device md0, logical block 434495875 [ 3182.726857] Buffer I/O error on device md0, logical block 434495876 [ 3182.726927] Buffer I/O error on device md0, logical block 434495877 [ 3182.727036] Buffer I/O error on device md0, logical block 434495878 [ 3182.727129] Buffer I/O error on device md0, logical block 434495879 [ 3182.727210] Buffer I/O error on device md0, logical block 434495880 [ 3182.727292] Buffer I/O error on device md0, logical block 434495881 [ 3182.727374] Buffer I/O error on device md0, logical block 434495882 [ 3201.149784] quiet_error: 118 callbacks suppressed [ 3201.149786] Buffer I/O error on device md0, logical block 434495883 [ 3243.707353] Buffer I/O error on device md0, logical block 458225568 [ 3243.707439] Buffer I/O error on device md0, logical block 458225569 [ 3243.707526] Buffer I/O error on device md0, logical block 458225570 [ 3243.707600] Buffer I/O error on device md0, logical block 458225571 [ 3243.707675] Buffer I/O error on device md0, logical block 458225572 [ 3243.707748] Buffer I/O error on device md0, logical block 458225573 [ 3243.707825] Buffer I/O error on device md0, logical block 458225574 [ 3243.707903] Buffer I/O error on device md0, logical block 458225575 [ 3243.707975] Buffer I/O error on device md0, logical block 458225576 [ 3410.602968] quiet_error: 139 callbacks suppressed [ 3410.602971] Buffer I/O error on device md0, logical block 490875483 [ 3410.603049] Buffer I/O error on device md0, logical block 490875611 [ 3410.603126] Buffer I/O error on device md0, logical block 490875484 [ 3410.603204] Buffer I/O error on device md0, logical block 490875612 [ 3410.603279] Buffer I/O error on device md0, logical block 490875485 [ 3410.603349] Buffer I/O error on device md0, logical block 490875613 [ 3410.603424] Buffer I/O error on device md0, logical block 490875486 [ 3410.603509] Buffer I/O error on device md0, logical block 490875614 [ 3410.603592] Buffer I/O error on device md0, logical block 490875487 [ 3410.603663] Buffer I/O error on device md0, logical block 490875615 The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >> raid.b" before and after running the "dd" command returned no changes: Bad-blocks on /dev/sdb: 112269328 for 512 sectors 112269840 for 512 sectors 112271376 for 512 sectors 112271888 for 512 sectors 112272400 for 512 sectors 112272912 for 512 sectors 112273424 for 512 sectors 112273936 for 512 sectors 112333840 for 512 sectors 112334352 for 512 sectors 112337680 for 128 sectors 130752768 for 512 sectors 130753280 for 512 sectors 130755840 for 512 sectors 130756352 for 512 sectors 130757120 for 384 sectors 149045752 for 512 sectors 149046264 for 512 sectors 212193536 for 512 sectors 212194048 for 512 sectors 248914952 for 512 sectors 248915464 for 512 sectors 262105344 for 512 sectors 262105856 for 512 sectors 273867480 for 512 sectors 273867992 for 512 sectors Bad-blocks list is empty in /dev/sdc Bad-blocks list is empty in /dev/sdd Bad-blocks on /dev/sde: 114228480 for 512 sectors 114228992 for 512 sectors Bad-blocks on /dev/sdf: 248545288 for 512 sectors 248545800 for 512 sectors 487421952 for 512 sectors 487422464 for 512 sectors 487422976 for 128 sectors Bad-blocks list is empty in /dev/sdg Bad-blocks on /dev/sdh: 280763096 for 512 sectors 280763608 for 512 sectors Bad-blocks list is empty in /dev/sdi Bad-blocks list is empty in /dev/sdj Bad-blocks on /dev/sdk: 124707840 for 512 sectors 124708352 for 512 sectors 124708864 for 512 sectors 124709376 for 512 sectors 124712192 for 384 sectors 130771840 for 256 sectors 130803968 for 512 sectors 130804480 for 512 sectors 130808960 for 256 sectors 130852224 for 256 sectors 130852608 for 256 sectors 130853120 for 256 sectors 130859520 for 256 sectors 150267392 for 512 sectors 150267904 for 512 sectors 211985968 for 512 sectors 211986480 for 512 sectors 212037552 for 256 sectors 212051504 for 512 sectors 212052016 for 512 sectors 213166336 for 512 sectors 213166848 for 512 sectors 213167360 for 512 sectors 213167872 for 512 sectors 213177600 for 512 sectors 213178112 for 512 sectors 214650624 for 512 sectors 214651136 for 512 sectors 249476104 for 512 sectors 249476616 for 512 sectors 262317312 for 512 sectors 262317824 for 512 sectors 262318464 for 512 sectors 262318976 for 256 sectors 262321408 for 512 sectors 262321920 for 512 sectors 714478672 for 512 sectors 714479184 for 512 sectors 714754128 for 512 sectors 714754640 for 512 sectors 714755152 for 512 sectors 714755664 for 512 sectors 935584432 for 512 sectors 935584944 for 512 sectors 940173568 for 512 sectors 940174080 for 512 sectors 976792224 for 512 sectors 976792736 for 512 sectors 976793248 for 512 sectors 976793760 for 512 sectors 980668064 for 512 sectors 980668576 for 512 sectors 980669088 for 512 sectors 980669600 for 512 sectors Bad-blocks on /dev/sdl: 112269328 for 512 sectors 112269840 for 512 sectors 112271376 for 512 sectors 112271376 for 512 sectors 112271888 for 512 sectors 112272400 for 512 sectors 112272912 for 512 sectors 112273424 for 512 sectors 112273936 for 512 sectors 112333840 for 512 sectors 112334352 for 512 sectors 112337680 for 128 sectors 114228480 for 512 sectors 114228992 for 512 sectors 124707840 for 512 sectors 124708352 for 512 sectors 124708864 for 512 sectors 124709376 for 512 sectors 124712192 for 384 sectors 130752768 for 512 sectors 130753280 for 512 sectors 130755840 for 512 sectors 130756352 for 512 sectors 130757120 for 384 sectors 130771840 for 256 sectors 130803968 for 512 sectors 130804480 for 512 sectors 130808960 for 256 sectors 130852224 for 256 sectors 130852608 for 256 sectors 130853120 for 256 sectors 130859520 for 256 sectors 149045752 for 512 sectors 149046264 for 512 sectors 150267392 for 512 sectors 150267904 for 512 sectors 211985968 for 512 sectors 211986480 for 512 sectors 211996592 for 128 sectors 212037552 for 256 sectors 212051504 for 512 sectors 212052016 for 512 sectors 212193536 for 512 sectors 212194048 for 512 sectors 213166336 for 512 sectors 213166848 for 512 sectors 213167360 for 512 sectors 213167872 for 512 sectors 213177600 for 512 sectors 213178112 for 512 sectors 214650624 for 512 sectors 214651136 for 512 sectors 248545288 for 512 sectors 248545800 for 512 sectors 248914952 for 512 sectors 248915464 for 512 sectors 249476104 for 512 sectors 249476616 for 512 sectors 262105344 for 512 sectors 262105856 for 512 sectors 262317312 for 512 sectors 262317824 for 512 sectors 262318464 for 512 sectors 262318976 for 256 sectors 262321408 for 512 sectors 262321920 for 512 sectors 273867480 for 512 sectors 273867992 for 512 sectors 280763096 for 512 sectors 280763608 for 512 sectors 487421952 for 512 sectors 487422464 for 512 sectors 487422976 for 128 sectors 714478672 for 512 sectors 714479184 for 512 sectors 714754128 for 512 sectors 714754640 for 512 sectors 714755152 for 512 sectors 714755664 for 512 sectors 935584432 for 512 sectors 935584944 for 512 sectors 940173568 for 512 sectors 940174080 for 512 sectors 976792224 for 512 sectors 976792736 for 512 sectors 976793248 for 512 sectors 976793760 for 512 sectors 980668064 for 512 sectors 980668576 for 512 sectors 980669088 for 512 sectors 980669600 for 512 sectors Bad-blocks on /dev/sdm: 112269328 for 512 sectors 112269840 for 512 sectors 112271376 for 512 sectors 112271888 for 512 sectors 112272400 for 512 sectors 112272912 for 512 sectors 112273424 for 512 sectors 112273936 for 512 sectors 112333840 for 512 sectors 112334352 for 512 sectors 112337680 for 128 sectors 114228480 for 512 sectors 114228992 for 512 sectors 124707840 for 512 sectors 124708352 for 512 sectors 124708864 for 512 sectors 124709376 for 512 sectors 124712192 for 384 sectors 130752768 for 512 sectors 130753280 for 512 sectors 130755840 for 512 sectors 130756352 for 512 sectors 130757120 for 384 sectors 130771840 for 256 sectors 130803968 for 512 sectors 130804480 for 512 sectors 130808960 for 256 sectors 130852224 for 256 sectors 130852608 for 256 sectors 130853120 for 256 sectors 130859520 for 256 sectors 149045752 for 512 sectors 149046264 for 512 sectors 150267392 for 512 sectors 150267904 for 512 sectors 211985968 for 512 sectors 211986480 for 512 sectors 211996592 for 128 sectors 212037552 for 256 sectors 212051504 for 512 sectors 212052016 for 512 sectors 212193536 for 512 sectors 212194048 for 512 sectors 213166336 for 512 sectors 213166848 for 512 sectors 213167360 for 512 sectors 213167872 for 512 sectors 213177600 for 512 sectors 213178112 for 512 sectors 214650624 for 512 sectors 214651136 for 512 sectors 248545288 for 512 sectors 248545800 for 512 sectors 248914952 for 512 sectors 248915464 for 512 sectors 249476104 for 512 sectors 249476616 for 512 sectors 262105344 for 512 sectors 262105856 for 512 sectors 262317312 for 512 sectors 262317824 for 512 sectors 262318464 for 512 sectors 262318976 for 256 sectors 262321408 for 512 sectors 262321920 for 512 sectors 273867480 for 512 sectors 273867992 for 512 sectors 280763096 for 512 sectors 280763608 for 512 sectors 487421952 for 512 sectors 487422464 for 512 sectors 487422976 for 128 sectors 714478672 for 512 sectors 714479184 for 512 sectors 714754128 for 512 sectors 714754640 for 512 sectors 714755152 for 512 sectors 714755664 for 512 sectors 935584432 for 512 sectors 935584944 for 512 sectors 940173568 for 512 sectors 940174080 for 512 sectors 976792224 for 512 sectors 976792736 for 512 sectors 976793248 for 512 sectors 976793760 for 512 sectors 980668064 for 512 sectors 980668576 for 512 sectors 980669088 for 512 sectors 980669600 for 512 sectors Bad-blocks on /dev/sdn: 112269328 for 512 sectors 112269840 for 512 sectors 112271376 for 512 sectors 112271888 for 512 sectors 112272400 for 512 sectors 112272912 for 512 sectors 112273424 for 512 sectors 112273936 for 512 sectors 112333840 for 512 sectors 112334352 for 512 sectors 112337680 for 128 sectors 114228480 for 512 sectors 114228992 for 512 sectors 124707840 for 512 sectors 124708352 for 512 sectors 124708864 for 512 sectors 124709376 for 512 sectors 124712192 for 384 sectors 130752768 for 512 sectors 130753280 for 512 sectors 130755840 for 512 sectors 130756352 for 512 sectors 130757120 for 384 sectors 130771840 for 256 sectors 130803968 for 512 sectors 130804480 for 512 sectors 130808960 for 256 sectors 130852224 for 256 sectors 130852608 for 256 sectors 130853120 for 256 sectors 130859520 for 256 sectors 149045752 for 512 sectors 149046264 for 512 sectors 150267392 for 512 sectors 150267904 for 512 sectors 211985968 for 512 sectors 211986480 for 512 sectors 211996592 for 128 sectors 212037552 for 256 sectors 212051504 for 512 sectors 212052016 for 512 sectors 212193536 for 512 sectors 212194048 for 512 sectors 213166336 for 512 sectors 213166848 for 512 sectors 213167360 for 512 sectors 213167872 for 512 sectors 213177600 for 512 sectors 213178112 for 512 sectors 214650624 for 512 sectors 214651136 for 512 sectors 248545288 for 512 sectors 248545800 for 512 sectors 248914952 for 512 sectors 248915464 for 512 sectors 249476104 for 512 sectors 249476616 for 512 sectors 262105344 for 512 sectors 262105856 for 512 sectors 262317312 for 512 sectors 262317824 for 512 sectors 262318464 for 512 sectors 262318976 for 256 sectors 262321408 for 512 sectors 262321920 for 512 sectors 273867480 for 512 sectors 273867992 for 512 sectors 280763096 for 512 sectors 280763608 for 512 sectors 487421952 for 512 sectors 487422464 for 512 sectors 487422976 for 128 sectors 714478672 for 512 sectors 714479184 for 512 sectors 714754128 for 512 sectors 714754640 for 512 sectors 714755152 for 512 sectors 714755664 for 512 sectors 935584432 for 512 sectors 935584944 for 512 sectors 940173568 for 512 sectors 940174080 for 512 sectors 976792224 for 512 sectors 976792736 for 512 sectors 976793248 for 512 sectors 976793760 for 512 sectors 980668064 for 512 sectors 980668576 for 512 sectors 980669088 for 512 sectors 980669600 for 512 sectors Bad-blocks on /dev/sdo: 112269328 for 512 sectors 112269840 for 512 sectors 112271376 for 512 sectors 112271888 for 512 sectors 112272400 for 512 sectors 112272912 for 512 sectors 112273424 for 512 sectors 112273936 for 512 sectors 112333840 for 512 sectors 112334352 for 512 sectors 112337680 for 128 sectors 114228480 for 512 sectors 114228992 for 512 sectors 124707840 for 512 sectors 124708352 for 512 sectors 124708864 for 512 sectors 124709376 for 512 sectors 124712192 for 384 sectors 130752768 for 512 sectors 130753280 for 512 sectors 130755840 for 512 sectors 130756352 for 512 sectors 130757120 for 384 sectors 130771840 for 256 sectors 130803968 for 512 sectors 130804480 for 512 sectors 130808960 for 256 sectors 130852224 for 256 sectors 130852608 for 256 sectors 130853120 for 256 sectors 130859520 for 256 sectors 149045752 for 512 sectors 149046264 for 512 sectors 150267392 for 512 sectors 150267904 for 512 sectors 211985968 for 512 sectors 211986480 for 512 sectors 211996592 for 128 sectors 212037552 for 256 sectors 212051504 for 512 sectors 212052016 for 512 sectors 212193536 for 512 sectors 212194048 for 512 sectors 213166336 for 512 sectors 213166848 for 512 sectors 213167360 for 512 sectors 213167872 for 512 sectors 213177600 for 512 sectors 213178112 for 512 sectors 214650624 for 512 sectors 214651136 for 512 sectors 248545288 for 512 sectors 248545800 for 512 sectors 248914952 for 512 sectors 248915464 for 512 sectors 249476104 for 512 sectors 249476616 for 512 sectors 262105344 for 512 sectors 262105856 for 512 sectors 262317312 for 512 sectors 262317824 for 512 sectors 262318464 for 512 sectors 262318976 for 256 sectors 262321408 for 512 sectors 262321920 for 512 sectors 273867480 for 512 sectors 273867992 for 512 sectors 280763096 for 512 sectors 280763608 for 512 sectors 487421952 for 512 sectors 487422464 for 512 sectors 487422976 for 128 sectors 714478672 for 512 sectors 714479184 for 512 sectors 714754128 for 512 sectors 714754640 for 512 sectors 714755152 for 512 sectors 714755664 for 512 sectors 935584432 for 512 sectors 935584944 for 512 sectors 940173568 for 512 sectors 940174080 for 512 sectors 976792224 for 512 sectors 976792736 for 512 sectors 976793248 for 512 sectors 976793760 for 512 sectors 980668064 for 512 sectors 980668576 for 512 sectors 980669088 for 512 sectors 980669600 for 512 sectors Bad-blocks list is empty in /dev/sdp Bad-blocks on /dev/sdq: 211996592 for 128 sectors ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>]
* Re: strange problem with raid6 read errors on active non-degraded array [not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de> @ 2014-07-02 14:14 ` Pedro Teixeira 2014-07-02 14:55 ` Lars Täuber 2014-07-02 16:35 ` Ethan Wilson 0 siblings, 2 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-02 14:14 UTC (permalink / raw) To: Lars Täuber; +Cc: linux-raid Hi Lars, the output of those commands: root@nas3:/# cat /sys/block/sdb/queue/physical_block_size 4096 root@nas3:/# cat /sys/block/md0/queue/physical_block_size 4096 root@nas3:/# The strange thing here is that dmesg is not poluted with sata errors like it is usual when a hard disk has bad sectors or some other hardware problem. the only thing in dmesg that hints to why reading the md volume fails are from dm itself. Cheers Pedro Citando Lars Täuber > Hi Pedro, > > maybe an issue with the logical/physical blocksize? > What tell these commands: > > cat /sys/block/sdb/queue/physical_block_size > cat /sys/block/md0/queue/physical_block_size > > Seagate says there are 4096 bytes/sector on this devices. > > Lars ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 14:14 ` Pedro Teixeira @ 2014-07-02 14:55 ` Lars Täuber 2014-07-02 16:35 ` Ethan Wilson 1 sibling, 0 replies; 19+ messages in thread From: Lars Täuber @ 2014-07-02 14:55 UTC (permalink / raw) To: linux-raid Hi Pedro, Wed, 02 Jul 2014 15:14:06 +0100 Pedro Teixeira <finas@aeiou.pt> ==> Lars Täuber <taeuber@bbaw.de> : > Hi Lars, > > the output of those commands: > > root@nas3:/# cat /sys/block/sdb/queue/physical_block_size > 4096 > root@nas3:/# cat /sys/block/md0/queue/physical_block_size > 4096 > root@nas3:/# > > The strange thing here is that dmesg is not poluted with sata errors > like it is usual when a hard disk has bad sectors or some other > hardware problem. the only thing in dmesg that hints to why reading > the md volume fails are from dm itself. maybe because the controller-drive combination doesn't fit. Does the controller tell some errors? The LSI 9201-i16 compatibility list doesn't mention any 4k SATA drive. Only 3 4k-SAS drives (seagate though) are mentioned to be compatible. Maybe that's the cause? Good luck Lars > Cheers > Pedro > > > Citando Lars Täuber > > Hi Pedro, > > > > maybe an issue with the logical/physical blocksize? > > What tell these commands: > > > > cat /sys/block/sdb/queue/physical_block_size > > cat /sys/block/md0/queue/physical_block_size > > > > Seagate says there are 4096 bytes/sector on this devices. > > > > Lars > > > > ________________________________________________________________________________ > Mensagem enviada através do email grátis AEIOU > http://www.aeiou.pt -- Informationstechnologie Berlin-Brandenburgische Akademie der Wissenschaften Jägerstraße 22-23 10117 Berlin Tel.: +49 30 20370-352 http://www.bbaw.de -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 14:14 ` Pedro Teixeira 2014-07-02 14:55 ` Lars Täuber @ 2014-07-02 16:35 ` Ethan Wilson [not found] ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt> 1 sibling, 1 reply; 19+ messages in thread From: Ethan Wilson @ 2014-07-02 16:35 UTC (permalink / raw) To: Pedro Teixeira, Lars Täuber; +Cc: linux-raid You have multiple bad-blocks list (an MD feature) which are already full of sectors. Those are earlier disk errors which were stored on MD headers (one list per drive). MD will not try to read from such sectors anymore, and during reads MD will return error to the upper layers immediately. This is if the stripe does not have enough good components to read after excluding the bad blocks, e.g. raid5 is able to tolerate up to 1 disk with badblocks in a stripe, so with 2 badblocks in 2 different disks in the same stripes MD will return a read error immediately and without trying. That's why in dmesg you are seeing read errors from MD but not from the component devices. Now the question is how could so many badblocks be recorded on your array. It seems very unlikely that so many disks of your array are in such bad shape . This might indicate an MD bug in the badblocks code. I am thinking some form of erroneous propagation of bad blocks, so that e.g. writing to an area where an MD badblock exists, instead of clearing the bad block could have propagated the badblock to the other disks in the same stripe. Something like that. See if you can check that writing to a bad block clears it. It will be difficult to compute the correct offset to write to, though. You might want to do some trials-and-errors with dd together with blktrace. If you can do that, you might want to check that it behaves correctly even when writing something that does not align to 512b or 4k . Obviously this test is desctructive wrt your data in that location. Another easier test is if to try to read with dd from a component device itself. If MD has recorded (even if happened long time in the past) a bad block there, the direct read with dd should also hit it, return error and stop, because badblocks in the surface of disks do not heal by themselves with time. Another test is to read from md0 with dd from an area where you see that only 1 disk has badblocks (probably requires some trial and error with blktrace because the offsets of md0 are not equal to the offsets of the component devices) . If MD works correctly, with such read it should "heal" the badblock: compute from parity from the other disks, then write over the badblock. The MD badblock should disappear. The last 2 tests I described should not be destructive except in case of MD bugs. EW On 02/07/2014 16:14, Pedro Teixeira wrote: > Hi Lars, > > the output of those commands: > > root@nas3:/# cat /sys/block/sdb/queue/physical_block_size > 4096 > root@nas3:/# cat /sys/block/md0/queue/physical_block_size > 4096 > root@nas3:/# > > The strange thing here is that dmesg is not poluted with sata errors > like it is usual when a hard disk has bad sectors or some other > hardware problem. the only thing in dmesg that hints to why reading > the md volume fails are from dm itself. > > Cheers > Pedro > > > Citando Lars Täuber >> Hi Pedro, >> >> maybe an issue with the logical/physical blocksize? >> What tell these commands: >> >> cat /sys/block/sdb/queue/physical_block_size >> cat /sys/block/md0/queue/physical_block_size >> >> Seagate says there are 4096 bytes/sector on this devices. >> >> Lars > > > > ________________________________________________________________________________ > > Mensagem enviada através do email grátis AEIOU > http://www.aeiou.pt > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>]
* Re: strange problem with raid6 read errors on active non-degraded array [not found] ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt> @ 2014-07-02 21:34 ` Ethan Wilson 0 siblings, 0 replies; 19+ messages in thread From: Ethan Wilson @ 2014-07-02 21:34 UTC (permalink / raw) To: Pedro Teixeira; +Cc: Lars Täuber, linux-raid On 02/07/2014 20:28, Pedro Teixeira wrote: > > Hi Ethan, > > The thing here is that some of the bad blocks ( if not all ) that are > giving read errors are not on the bad blocks list. > Are you sure? Please note that the offset is a complex topic because an offset given by fsck will be a sector offset in the md0 sense, while the device badblock list contains offset in the device sense, which means that to convert one onto the other you have to divide, or multiply, by the number of data disks, approximately, and handle the remainder manually also considering the problem of the rotating parity. Not simple. Is this the computation that you did? > Specifically, the ones that show up when doing a fsck are not on any > drive. For these sectors fsck tries to re-write then and md still > throws an error but they are not added to the list. > Not "added" but "removed". Writing to a bad block should create valid content so they should be removed from the list. If they don't then indeed there is probably a bug in the MD code, see my previous post. > I replaced sdm with a new disk. this was one that had a bunch or bad > blocks reported by md, and after finishing the rebuild ( with no > errors at all ) the --examine-badblocks still gives me the exact same > list of errors. I would expect that replacing the disk by a new one > would clear the errors. > This is the correct behaviour by design. Source disks did not have valid content in those positions, so good data cannot be created from nothing. Badblocks will be replicated onto the new disk. "Bad" here is more a synonym of "containing invalid data", not really "unreadable surface". > as I know the disks are good, is there any way of reseting the bad > blocks list without destroying the filesystem? > This one I don't know but doing that would probably not help to find the bug. Regads EW ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 11:54 ` Pedro Teixeira [not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de> @ 2014-07-02 16:43 ` John Stoffel [not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt> 2014-07-03 2:40 ` NeilBrown 2 siblings, 1 reply; 19+ messages in thread From: John Stoffel @ 2014-07-02 16:43 UTC (permalink / raw) To: Pedro Teixeira; +Cc: NeilBrown, linux-raid >>>>> "Pedro" == Pedro Teixeira <finas@aeiou.pt> writes: Pedro> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are Pedro> seagate sshd ST1000DX001. Pedro> So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on Pedro> alot of places. I had to restart the command several times with the Pedro> skip parameter set to a couple of blocks after the last block error. Pedro> It run for about 1.5TB of the total 13TB of the volume. Pedro> The md volume didn't drop any drive when running this. Can you destroy the filesystem and re-create the RAID6 from scratch by any chance? Or can you maybe create a smaller array with only 6 devices to run some tests? Can you provide more details on your ext4 filesystem using tune2fs? Have you tried using XFS instead? Does the filesystem have a logfile or not? And does a full fsck run to completion? Have you checked all the cables? Do you have RAID firmware on the LSI card by any chance, or are they setup as JBOD? Could you have a too small a power supply so you're seeing corruption on the system due to low voltage on one of the 5V or 12V rails? Can you try powering half the disks from another power supply as a test? Do you have a graphics card in the system? If so, can you pull it and run it headless, or maybe put in a less power hungry card? John -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>]
* Re: strange problem with raid6 read errors on active non-degraded array [not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt> @ 2014-07-02 18:41 ` Pedro Teixeira 2014-07-02 19:01 ` John Stoffel 1 sibling, 0 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-02 18:41 UTC (permalink / raw) To: John Stoffel, NeilBrown, linux-raid Hi John, I can't destroy the fs at the moment. The problem is not filesystem related as md throws an error when reading with dd when the filesystem is not mounted. The controler is flashed with the latest P19 firmware IT mode, meaning that disks are "passed-though". No raid or jbod. Power supply has a singe 12v rail and total output of 800w. Graphics card is a pcie 1x nvidia card. I have a very similar machine, that has the same case, the same power supply, the same LSI controller in the same mode with the same firmware, same OS, same kernel. Diferences are the motherboard Z87 chipset and i7 cpu, and the hard disks are 16x 4TB seagate HDD's in raid6 created the exact same way as this one with mdadm 3.3. I have no problems with it. Cheers Pedro ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array [not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt> 2014-07-02 18:41 ` Pedro Teixeira @ 2014-07-02 19:01 ` John Stoffel 1 sibling, 0 replies; 19+ messages in thread From: John Stoffel @ 2014-07-02 19:01 UTC (permalink / raw) To: Pedro Teixeira; +Cc: John Stoffel, NeilBrown, linux-raid Pedro> I can't destroy the fs at the moment. The problem is not Pedro> filesystem related as md throws an error when reading with dd Pedro> when the filesystem is not mounted. I hope you have backups of all this data, because I stongly suspect you've run into either an MD coding problem, or you have the data structures so confused that MD really needs to be re-built from scratch. Pedro> The controler is flashed with the latest P19 firmware IT mode, Pedro> meaning that disks are "passed-though". No raid or jbod. JBOD means Just a Bunch Of Disks, which is what you have, so good. Pedro> Power supply has a singe 12v rail and total output of 800w. Should be ok then. Pedro> Graphics card is a pcie 1x nvidia card. I have a very similar Pedro> machine, that has the same case, the same power supply, the Pedro> same LSI controller in the same mode with the same firmware, Pedro> same OS, same kernel. Diferences are the motherboard Z87 Pedro> chipset and i7 cpu, and the hard disks are 16x 4TB seagate Pedro> HDD's in raid6 created the exact same way as this one. I have Pedro> no problems with it. Hmm... so how did the system crash and lose the disk(s) in the fist place? Did the cables get knocked? Are they in a disk cage or hot swap bays? Why kinds of physical cabling are you using here? The suggestion to use blktrace to examine how IO flows into the MD device and then down into the various devices is a good one, but I don't have any good suggestions on what to do here. But in any case, I'll repeat this now. Backup your data, and basically assume some of it is toast and needs to be restored or re-created if at all possible. With all the errors you're showing, there's bound to be major filesystem corruption and even undetected corruption in some files on there. Not a good place to be. Too bad you can't just copy the data off to the other machine with the 16 x 4Tb disks. That would give you a good chance to save your data. John ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-02 11:54 ` Pedro Teixeira [not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de> 2014-07-02 16:43 ` John Stoffel @ 2014-07-03 2:40 ` NeilBrown 2014-07-03 8:29 ` Pedro Teixeira ` (2 more replies) 2 siblings, 3 replies; 19+ messages in thread From: NeilBrown @ 2014-07-03 2:40 UTC (permalink / raw) To: Pedro Teixeira; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3599 bytes --] On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote: > cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are > seagate sshd ST1000DX001. > > So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on > alot of places. I had to restart the command several times with the > skip parameter set to a couple of blocks after the last block error. > It run for about 1.5TB of the total 13TB of the volume. > The md volume didn't drop any drive when running this. > > dmesg showed: > > [ 1678.478156] Buffer I/O error on device md0, logical block 196012546 I love numbers, thanks. The logical block size is 4096, or 8 sectors (1 sector is defined as 512 bytes), so this is at 196012546*8 == 1568100368 sectors into the array. The array has a chunksize of 512K, or 1024 sectors so 196012546*8/1024 = 1531348.015625 gives us the chunk number, and the remaining fraction of a chunk. The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to find where the above chunk is stored we divide by 14 1531348/14 = 109382.0000 So that is chunk 109382 on the first device (though with rotating data, it might not be the very first). Add back in the factional part, multiple by 1024 sectors per chunk, and add the Data Offset, 109382.01562500*1024+262144 = 112269328 So it seems that sector 112269328 on some device is bad. > The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >> > raid.b" before and after running the "dd" command returned no changes: > I didn't notice the fact that the bad block logs were not empty before, sorry. Anyway:... > > Bad-blocks on /dev/sdb: > 112269328 for 512 sectors Look at that - exactly the number I calculated. I love it when that works out. So the problem is exactly that some blocks are thought by md to be bad. Blocks get recorded as bad (for raid6) when: - a 'read' reported an error which could not be fixed, either because the array was degraded so the data could not be recovered, or because the attempt to write restored data failed - when recovering a spare, if the data to be written cannot be found (due to errors on other devices) - when a 'write' request to a device fails When your array had three failed devices, some reads and writes would have failed. Maybe that caused the bad blocks to be recorded. What sort of devices failures where they? If the device became completely inaccessible, then it would not have been possible to record the bad block information. Can you describe the sequence of events that lead to the three failures? When you put the array back together, did you --create it, or --assemble --force? There isn't an easy way to remove the bad block list, as doing so is normally asking for data corruption. However it is probably justified in your case. As it happens I included code in the kernel to make it possible to remove bad blocks from the list - it was intended for testing only but I never removed it. If you run sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks | while read; do echo $a > /sys/block/md0/md/dev-sdq/bad_blocks done then it should clear all of the bad blocks recorded on sdq. You should probably fail/remove the last two devices that you added to the array before you do this, as they probably don't have properly uptodate information and doing this will cause corruption. I probably need to think about better ways to handle the bad block lists. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-03 2:40 ` NeilBrown @ 2014-07-03 8:29 ` Pedro Teixeira 2014-07-03 10:39 ` Pedro Teixeira 2014-07-03 21:06 ` Pedro Teixeira 2 siblings, 0 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-03 8:29 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Hi Neil, Thanks for the very informative answer, that nailed it, and Ethan was obviously onto it too! I tried running the commands you posted and it gives me an error: bb.sh " sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks while read; do echo $a > /sys/block/md0/md/dev-sdq/bad_blocks done " root@nas3:~# ./bb.sh -211996592 128 ./bb.sh: line 3: echo: write error: Invalid argument " Can you help me with this? I will clear all the bad blocks on all the drives and force a repair and see if some error shows up. If not, I will then fsck the filesystem. I'm not sure how the volume failed. On one friday morning ( past month ) I checked the system and everything was ok ( no dmesg errors and mdastat repoted all disks up ). next monday I got a call telling me that the volume was inacessible. When I got back the next thursday, the machine had already been rebooted and the md0 volume had three failed disks. I did a --examine and two of them were completly off in terms of events regarding the non-failed disks. the other one was much more close, but still a bit off. Not close enough to do a --assemble --force, so I recreated the array with something like this: "mdadm --create --assume-clean --level=6 --raid-devices=16 --name=nas3:Datastore --uuid=9e97c588:59135324:c7d3fdf6:e543bdc3 /dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdb /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl missing /dev/sdn missing /dev/sdp /dev/sdq". I think the last failed drive was sdl or sdn, can't remember. then I cleared the superblocks on the missing disks and readded them. Then I fsck'd the filesystem and I started getting those errors. I since then replaced them with new disks and tested the old ones only to find that they have no smart errors reported ( smart is enabled in bios ) and I also did a read-write test to them and I found them to be ok. I will rebuild this machine next weekend, or the one after that, to try to sort out some hardware problem or issues with the cabling, but I am inclined to say that maybe it's related to the sshd's. Cheers Pedro Citando NeilBrown <neilb@suse.de>: > On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@aeiou.pt> wrote: >> cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are >> seagate sshd ST1000DX001. >> >> So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on >> alot of places. I had to restart the command several times with the >> skip parameter set to a couple of blocks after the last block error. >> It run for about 1.5TB of the total 13TB of the volume. >> The md volume didn't drop any drive when running this. >> >> dmesg showed: >> >> [ 1678.478156] Buffer I/O error on device md0, logical block 196012546 > I love numbers, thanks. > The logical block size is 4096, or 8 sectors (1 sector is defined as 512 > bytes), so this is at > 196012546*8 == 1568100368 sectors into the array. > > The array has a chunksize of 512K, or 1024 sectors so > 196012546*8/1024 = 1531348.015625 > > gives us the chunk number, and the remaining fraction of a chunk. > > The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to > find where the above chunk is stored we divide by 14 > > 1531348/14 = 109382.0000 > > So that is chunk 109382 on the first device (though with rotating data, > it might not be the very first). > > Add back in the factional part, multiple by 1024 sectors per chunk, and add > the Data Offset, > > 109382.01562500*1024+262144 = 112269328 > > So it seems that sector 112269328 on some device is bad. >> The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >> >> raid.b" before and after running the "dd" command returned no changes: > I didn't notice the fact that the bad block logs were not empty > before, sorry. > Anyway:... > Bad-blocks on /dev/sdb: >> 112269328 for 512 sectors > Look at that - exactly the number I calculated. I love it when that works > out. > > So the problem is exactly that some blocks are thought by md to be bad. > > > Blocks get recorded as bad (for raid6) when: > > - a 'read' reported an error which could not be fixed, either > because the array was degraded so the data could not be recovered, > or because the attempt to write restored data failed > - when recovering a spare, if the data to be written cannot be > found (due to > errors on other devices) > - when a 'write' request to a device fails > > When your array had three failed devices, some reads and writes would have > failed. Maybe that caused the bad blocks to be recorded. > What sort of devices failures where they? If the device became completely > inaccessible, then it would not have been possible to record the bad block > information. > > Can you describe the sequence of events that lead to the three failures? > When you put the array back together, did you --create it, or --assemble > --force? > > There isn't an easy way to remove the bad block list, as doing so > is normally > asking for data corruption. > However it is probably justified in your case. > As it happens I included code in the kernel to make it possible to > remove bad > blocks from the list - it was intended for testing only but I never removed > it. > If you run > sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks | > while read; do > echo $a > /sys/block/md0/md/dev-sdq/bad_blocks > done > > then it should clear all of the bad blocks recorded on sdq. > You should probably fail/remove the last two devices that you added to the > array before you do this, as they probably don't have properly uptodate > information and doing this will cause corruption. > > I probably need to think about better ways to handle the bad block lists. > NeilBrown ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-03 2:40 ` NeilBrown 2014-07-03 8:29 ` Pedro Teixeira @ 2014-07-03 10:39 ` Pedro Teixeira 2014-07-03 21:06 ` Pedro Teixeira 2 siblings, 0 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-03 10:39 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid I ended up understanding the command but if I run it manually it doesn't work. bad_block is cleared but the --examine-badblocks stills shows it. and after stopping/assembling the md volume the bad block shows up again. root@nas3:~# mdadm --stop /dev/md0 mdadm: stopped /dev/md0 root@nas3:~# mdadm --assemble /dev/md0 mdadm: failed to get exclusive lock on mapfile - continue anyway... mdadm: /dev/md0 has been started with 16 drives. root@nas3:~# cat /sys/block/md0/md/dev-sdq/bad_blocks 211996592 128 root@nas3:~# echo "-211996592 128" > /sys/block/md0/md/dev-sdq/bad_blocks root@nas3:~# cat /sys/block/md0/md/dev-sdq/bad_blocks root@nas3:~# mdadm --examine-badblocks /dev/sdq Bad-blocks on /dev/sdq: 211996592 for 128 sectors root@nas3:~# so "cat /sys/block/md0/md/dev-sdq/bad_blocks" shows now bad blocks, but the --examine-badblocks still lists it. ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: strange problem with raid6 read errors on active non-degraded array 2014-07-03 2:40 ` NeilBrown 2014-07-03 8:29 ` Pedro Teixeira 2014-07-03 10:39 ` Pedro Teixeira @ 2014-07-03 21:06 ` Pedro Teixeira 2 siblings, 0 replies; 19+ messages in thread From: Pedro Teixeira @ 2014-07-03 21:06 UTC (permalink / raw) To: linux-raid I was able to fix the volume and the filesystem! - the command Neil posted didn't work but I got the idea and made a script that cleared the list for all disks. The --examine-bad-blocks still lists the bad blocks, and stopping and assembling the volume again will populate the bad block list again. Still, I cleared them all again and issued a "repair" on the volume. I got a bunch of errors from a couple of disks, mostly sdk and sdb but the volume synced till the end, and after stopping it and assembling it again, no bad blocks in any disk, and --examine-bad-blocks also showed no bad blocks. I have since replaced sdk and sdb, with no errors when syncing and no errors on dmesg. After that I fsck'd the filesystem, and it's up and running again. I will now replace the other two disks that exibited read errors when repairing the volume as soon as I get some replacements. Thanks all for the help!!! As a sugestion, I would make md distinguish a read error that is caused by no good strip available due to bad block list from other read errors to ease troubleshooting and maybe implement a way to clear bad block list from disks with mdadm ( and maybe forcing a resync of that strip after the list is cleared ). Cheers Pedro ________________________________________________________________________________ Mensagem enviada através do email grátis AEIOU http://www.aeiou.pt -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-07-03 21:06 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02 9:32 strange problem with raid6 read errors on active non-degraded array Pedro Teixeira
2014-07-02 9:52 ` Roman Mamedov
2014-07-02 10:07 ` Pedro Teixeira
2014-07-02 10:11 ` Roman Mamedov
2014-07-02 10:37 ` Pedro Teixeira
2014-07-02 11:03 ` Pedro Teixeira
2014-07-02 10:45 ` NeilBrown
2014-07-02 11:54 ` Pedro Teixeira
[not found] ` <20140702152429.742a3e8ea8bd100f5b3bae1f@bbaw.de>
2014-07-02 14:14 ` Pedro Teixeira
2014-07-02 14:55 ` Lars Täuber
2014-07-02 16:35 ` Ethan Wilson
[not found] ` <20140702192825.Horde.18y4TPYRo99TtE9JC9kSzUA@webmail.aeiou.pt>
2014-07-02 21:34 ` Ethan Wilson
2014-07-02 16:43 ` John Stoffel
[not found] ` <20140702193706.Horde.Q4yuGvYRo99TtFFSw8qw6-A@webmail.aeiou.pt>
2014-07-02 18:41 ` Pedro Teixeira
2014-07-02 19:01 ` John Stoffel
2014-07-03 2:40 ` NeilBrown
2014-07-03 8:29 ` Pedro Teixeira
2014-07-03 10:39 ` Pedro Teixeira
2014-07-03 21:06 ` Pedro Teixeira
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox