Buffer I/O error on dev md5, logical block 7073536, async page read

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Buffer I/O error on dev md5, logical block 7073536, async page read
@ 2016-10-30  2:16 Marc MERLIN
  2016-10-30  9:33 ` Andreas Klauer
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-10-30  2:16 UTC (permalink / raw)
  To: linux-raid

Howdy,

I'm struggling with this problem.

I have this md5 array with 5 drives:
Personalities : [linear] [raid0] [raid1] [raid10] [multipath] [raid6] [raid5] [raid4] 
md5 : active raid5 sdg1[0] sdh1[6] sdf1[2] sde1[3] sdd1[5]
      15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

I started having filesystem problems with it, so I did a scan with hdrecover on the drives first,
and that passed. Then I did it on the md5 array, and it failed.

With a simple dd, I get this:

25526374400 bytes (26 GB) copied, 249.888 s, 102 MB/s
dd: reading `/dev/md5': Input/output error
56588288+0 records in
56588288+0 records out
28973203456 bytes (29 GB) copied, 283.325 s, 102 MB/s
[1]+  Exit 1                  dd if=/dev/md5 of=/dev/null
kernel: [202693.708639] Buffer I/O error on dev md5, logical block 7073536, async page read

Yes, I can read the entire disk devices without problem (took a long time
to run, but it finished)

Can someone tell me how this is possible?
More generally, is it possible for the kernel to return an md error and then not log
any underlying hardware error on the drives the md was being read from?

Kernel 4.6.0. I'll upgrade just in case, but md has been stable enough for so many years that I'm 
thinking the problem is likely elsewhere.

Any ideas?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30  2:16 Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
@ 2016-10-30  9:33 ` Andreas Klauer
  2016-10-30 15:38   ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Andreas Klauer @ 2016-10-30  9:33 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-raid

On Sat, Oct 29, 2016 at 07:16:14PM -0700, Marc MERLIN wrote:
> Can someone tell me how this is possible?
> More generally, is it possible for the kernel to return an md error 
> and then not log any underlying hardware error on the drives the md 
> was being read from?

Is there something in mdadm --examine(-badblocks) /dev/sd*?

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30  9:33 ` Andreas Klauer
@ 2016-10-30 15:38   ` Marc MERLIN
  2016-10-30 16:19     ` Andreas Klauer
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-10-30 15:38 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

On Sun, Oct 30, 2016 at 10:33:37AM +0100, Andreas Klauer wrote:
> On Sat, Oct 29, 2016 at 07:16:14PM -0700, Marc MERLIN wrote:
> > Can someone tell me how this is possible?
> > More generally, is it possible for the kernel to return an md error 
> > and then not log any underlying hardware error on the drives the md 
> > was being read from?
> 
> Is there something in mdadm --examine(-badblocks) /dev/sd*?

Well, well, I learned something new today. First I had to upgrade my mdadm
tools to get that option, and sure enough:
myth:~# mdadm --examine-badblocks /dev/sd[defgh]1
Bad-blocks on /dev/sdd1:
            14408704 for 352 sectors
            14409568 for 160 sectors
           132523032 for 512 sectors
           372496968 for 440 sectors
Bad-blocks list is empty in /dev/sde1
Bad-blocks on /dev/sdf1:
            14408704 for 352 sectors
            14409568 for 160 sectors
           132523032 for 512 sectors
           372496968 for 440 sectors
Bad-blocks list is empty in /dev/sdg1
Bad-blocks list is empty in /dev/sdh1

So thank you for pointing me in the right direction.

I think they are due to the fact that it's an external disk array on a port
multiplier where sometimes I get bus errors that aren't actually on the
disks.

Questions:
1) shouldn't my array have been invalidated if I have bad blocks on 2 drives
in the same place or is the only possible way for this to happen that it did
get invalidated and I somehow force rebuilt the array to bring it back up
and I don't remember doing so?
(mmmh, but even so, rebuilding the spare should have cleared the bad blocks
on at least one drive, no?)

2) I'm currently running this, which I believe is the way to recover:
myth:~# echo 'check' > /sys/block/md5/md/sync_action 
but I'm not too hopeful on how that's going to work out if I have 2 drives with
supposed bad blocks at the same offsets.

Is there another way to just clear the bad block list on both drives if I've
already verified that those blocks are not bad and that they were due to some 
I/O errors that came from a bad cable connection?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30 15:38   ` Marc MERLIN
@ 2016-10-30 16:19     ` Andreas Klauer
  2016-10-30 16:34       ` Phil Turmel
  2016-10-30 16:43       ` Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
  0 siblings, 2 replies; 26+ messages in thread
From: Andreas Klauer @ 2016-10-30 16:19 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-raid

On Sun, Oct 30, 2016 at 08:38:57AM -0700, Marc MERLIN wrote:
> (mmmh, but even so, rebuilding the spare should have cleared the bad blocks
> on at least one drive, no?)

If n+1 disks have bad blocks there's no data to sync over, so they just 
propagate and stay bad forever. Or at least that's how it seemed to work 
last time I tried it. I'm not familiar with bad blocks. I just turn it off.

As long as the bad block list is empty you can --update=no-bbl.
If everything else fails - edit the metadata or carefully recreate.
Which I don't recommend because you can go wrong in a hundred ways.

I don't remember if anyone ever had a proper solution to this.
It came up a couple of times on the list so you could search.

If you've replaced drives since, the drive that has been part of the array 
the longest is probably the most likely to still have valid data in there. 
That could be synced over to the other drives once the bbl is cleared. 
It might not matter, you'd have to check with your filesystems if they 
believe any files located there. (Filesystems sometimes maintain their 
own bad block lists so you'd have to check those too.)

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30 16:19     ` Andreas Klauer
@ 2016-10-30 16:34       ` Phil Turmel
  2016-10-30 17:12         ` clearing blocks wrongfully marked as bad if --update=no-bbl can't be used? Marc MERLIN
  2016-10-30 18:56         ` [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds TomK
  2016-10-30 16:43       ` Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
  1 sibling, 2 replies; 26+ messages in thread
From: Phil Turmel @ 2016-10-30 16:34 UTC (permalink / raw)
  To: Andreas Klauer, Marc MERLIN; +Cc: linux-raid

On 10/30/2016 12:19 PM, Andreas Klauer wrote:
> On Sun, Oct 30, 2016 at 08:38:57AM -0700, Marc MERLIN wrote:
>> (mmmh, but even so, rebuilding the spare should have cleared the bad blocks
>> on at least one drive, no?)
> 
> If n+1 disks have bad blocks there's no data to sync over, so they just 
> propagate and stay bad forever. Or at least that's how it seemed to work 
> last time I tried it. I'm not familiar with bad blocks. I just turn it off.

I, too, turn it off.  (I never let it turn on, actually.)

I'm a little disturbed that this feature has become the default on new
arrays.  This feature was introduced specifically to support underlying
storage technologies that cannot perform their own bad block management.
 And since it doesn't implement any relocation algorithm for blocks
marked bad, it simply gives up any redundancy for affected sectors.  And
when there's no remaining redundancy, it simply passes the error up the
stack.  In this case, your errors were created by known communications
weaknesses that should always be recoverable with --assemble --force.

As far as I'm concerned, the bad block system is an incomplete feature
that should never be used in production, and certainly not on top of any
storage technology that implements error detection, correction, and
relocation.  Like, every modern SATA and SAS drive.

Phil

^ permalink raw reply	[flat|nested] 26+ messages in thread

* clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-10-30 16:34       ` Phil Turmel
@ 2016-10-30 17:12         ` Marc MERLIN
  2016-10-30 17:16           ` Marc MERLIN
  2016-10-30 18:56         ` [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds TomK
  1 sibling, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-10-30 17:12 UTC (permalink / raw)
  To: Phil Turmel, Neil Brown; +Cc: Andreas Klauer, linux-raid

Hi Neil,

Could you offer any guidance here? Is there somethign else I can do to clear
those fake bad blocks (the underlying disks are fine, I scanned them)
without rebuilding the array?

On Sun, Oct 30, 2016 at 12:34:56PM -0400, Phil Turmel wrote:
> On 10/30/2016 12:19 PM, Andreas Klauer wrote:
> > On Sun, Oct 30, 2016 at 08:38:57AM -0700, Marc MERLIN wrote:
> >> (mmmh, but even so, rebuilding the spare should have cleared the bad blocks
> >> on at least one drive, no?)
> > 
> > If n+1 disks have bad blocks there's no data to sync over, so they just 
> > propagate and stay bad forever. Or at least that's how it seemed to work 
> > last time I tried it. I'm not familiar with bad blocks. I just turn it off.
> 
> I, too, turn it off.  (I never let it turn on, actually.)
> 
> I'm a little disturbed that this feature has become the default on new
> arrays.  This feature was introduced specifically to support underlying
> storage technologies that cannot perform their own bad block management.
>  And since it doesn't implement any relocation algorithm for blocks
> marked bad, it simply gives up any redundancy for affected sectors.  And
> when there's no remaining redundancy, it simply passes the error up the
> stack.  In this case, your errors were created by known communications
> weaknesses that should always be recoverable with --assemble --force.
> 
> As far as I'm concerned, the bad block system is an incomplete feature
> that should never be used in production, and certainly not on top of any
> storage technology that implements error detection, correction, and
> relocation.  Like, every modern SATA and SAS drive.

Agreed. Just to confirm, I did indeed not willlingly turn this on, and I
really wish I had not been turned on automatically.
As you point out, I've never needed this, and cabling induced problems just
used to kill my array, I would fix the cabling and manually rebuild it.
Now my array doesn't get killed, but it gets rendered not very usable and
cause my filesystem (btrfs) to abort and fail when I access the wrong parts
of it.

I'm now stuck with those fake bad blocks that I can't remove without some
complicated surgery of editting md metadata on disk or recreating an array
on top of the current one with the option disabled and hope things line up.

This really ought to work, or something similar:
myth:~# mdadm --assemble --force --update=no-bbl /dev/md5
mdadm: Cannot remove active bbl from /dev/sdf1
mdadm: Cannot remove active bbl from /dev/sdd1
mdadm: /dev/md5 has been started with 5 drives.
(as in the array was assembled, but it's not really useful without those
fake bad blocks cleared from the bad block list)

And yes I agree that bad blocks should not be a default, now I really wish they
had never been auto turned on, I already lost a week of scanning this array
and looking at problems over thie feature that turns out made a wrong
assumption and doesn't seem to let me clear it :-/

Thanks both for your answer and pointing me in the right direction.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-10-30 17:12         ` clearing blocks wrongfully marked as bad if --update=no-bbl can't be used? Marc MERLIN
@ 2016-10-30 17:16           ` Marc MERLIN
  2016-11-04 18:18             ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-10-30 17:16 UTC (permalink / raw)
  To: Phil Turmel, Neil Brown, Andreas Klauer; +Cc: linux-raid

On Sun, Oct 30, 2016 at 10:12:34AM -0700, Marc MERLIN wrote:
> Hi Neil,
> 
> Could you offer any guidance here? Is there somethign else I can do to clear
> those fake bad blocks (the underlying disks are fine, I scanned them)
> without rebuilding the array?

On Sun, Oct 30, 2016 at 06:02:42PM +0100, Andreas Klauer wrote:
> > There should be some --update=no-bbl --force if the admin knows the bad
> > block list is wrong and due to IO issues not related to the drive.
> 
> Good point. And hey, there it is.
> 
> mdadm.c
> 
> |                       	if (strcmp(c.update, "bbl") == 0)
> |                               	continue;
> |                       	if (strcmp(c.update, "no-bbl") == 0)
> |                                continue;
> |                       	if (strcmp(c.update, "force-no-bbl") == 0)
> |                               	continue;
> 
> force-no-bbl. It's in mdadm v3.4, not sure about older ones.

Oh, very nice, thank you. It's not in the man page, but it works:

myth:~# mdadm --assemble --update=force-no-bbl /dev/md5
mdadm: /dev/md5 has been started with 5 drives.
myth:~# 
myth:~# mdadm --examine-badblocks /dev/sd[defgh]1
No bad-blocks list configured on /dev/sdd1
No bad-blocks list configured on /dev/sde1
No bad-blocks list configured on /dev/sdf1
No bad-blocks list configured on /dev/sdg1
No bad-blocks list configured on /dev/sdh1

Now I'll make sure to turn off this feature on all my other arrays
in case it got turned on without my asking for it.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-10-30 17:16           ` Marc MERLIN
@ 2016-11-04 18:18             ` Marc MERLIN
  2016-11-04 18:22               ` Phil Turmel
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-11-04 18:18 UTC (permalink / raw)
  To: Phil Turmel, Neil Brown, Andreas Klauer; +Cc: linux-raid

On Sun, Oct 30, 2016 at 10:16:54AM -0700, Marc MERLIN wrote:
> myth:~# mdadm --assemble --update=force-no-bbl /dev/md5
> mdadm: /dev/md5 has been started with 5 drives.
> myth:~# 
> myth:~# mdadm --examine-badblocks /dev/sd[defgh]1
> No bad-blocks list configured on /dev/sdd1
> No bad-blocks list configured on /dev/sde1
> No bad-blocks list configured on /dev/sdf1
> No bad-blocks list configured on /dev/sdg1
> No bad-blocks list configured on /dev/sdh1
> 
> Now I'll make sure to turn off this feature on all my other arrays
> in case it got turned on without my asking for it.

Right, so I thought I was home free, but not even close. My array is
back up, the badblock feature is disabled, array reports clean, but I
cannot access data past 8.8TB, it just fails.

myth:~# dd if=/dev/md5 of=/dev/null bs=1GB skip=8797
dd: reading `/dev/md5': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000403171 s, 0.0 kB/s
myth:~# dd if=/dev/md5 of=/dev/null bs=1GB skip=8796
dd: reading `/dev/md5': Invalid argument
1+0 records in
1+0 records out
1000000000 bytes (1.0 GB) copied, 10.5817 s, 94.5 MB/s


myth:~# mdadm --query --detail /dev/md5
 
/dev/md5:
        Version : 1.2
  Creation Time : Tue Jan 21 10:35:52 2014
     Raid Level : raid5
     Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent
 
  Intent Bitmap : Internal
 
    Update Time : Mon Oct 31 07:56:07 2016
          State : clean 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
  
         Layout : left-symmetric
     Chunk Size : 512K
  
           Name : gargamel.svh.merlins.org:5
           UUID : ec672af7:a66d9557:2f00d76c:38c9f705
         Events : 147992
  
    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       6       8      113        1      active sync   /dev/sdh1
       2       8       81        2      active sync   /dev/sdf1
       3       8       65        3      active sync   /dev/sde1
       5       8       49        4      active sync   /dev/sdd1


myth:~# 
myth:~# mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : ec672af7:a66d9557:2f00d76c:38c9f705
           Name : gargamel.svh.merlins.org:5
  Creation Time : Tue Jan 21 10:35:52 2014
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=0 sectors
          State : clean
    Device UUID : 075571ff:411517e9:027f8c2f:cef0457a

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Oct 31 07:56:07 2016
       Checksum : d4e74521 - correct
         Events : 147992

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

all 5 devices look about the same outside of serial numbers.

Any idea why it's failing that way?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 18:18             ` Marc MERLIN
@ 2016-11-04 18:22               ` Phil Turmel
  2016-11-04 18:50                 ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Phil Turmel @ 2016-11-04 18:22 UTC (permalink / raw)
  To: Marc MERLIN, Neil Brown, Andreas Klauer; +Cc: linux-raid

On 11/04/2016 02:18 PM, Marc MERLIN wrote:

> Right, so I thought I was home free, but not even close. My array is
> back up, the badblock feature is disabled, array reports clean, but I
> cannot access data past 8.8TB, it just fails.
> 
> myth:~# dd if=/dev/md5 of=/dev/null bs=1GB skip=8797
> dd: reading `/dev/md5': Invalid argument
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.000403171 s, 0.0 kB/s
> myth:~# dd if=/dev/md5 of=/dev/null bs=1GB skip=8796
> dd: reading `/dev/md5': Invalid argument
> 1+0 records in
> 1+0 records out
> 1000000000 bytes (1.0 GB) copied, 10.5817 s, 94.5 MB/s

That has nothing to do with MD.  You are using a power of ten suffix in
your block size, so you are running into non-aligned sector locations.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 18:22               ` Phil Turmel
@ 2016-11-04 18:50                 ` Marc MERLIN
  2016-11-04 18:59                   ` Roman Mamedov
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-11-04 18:50 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Neil Brown, Andreas Klauer, linux-raid

On Fri, Nov 04, 2016 at 02:22:48PM -0400, Phil Turmel wrote:
> On 11/04/2016 02:18 PM, Marc MERLIN wrote:
> 
> > Right, so I thought I was home free, but not even close. My array is
> > back up, the badblock feature is disabled, array reports clean, but I
> > cannot access data past 8.8TB, it just fails.
> > 
> > myth:~# dd if=/dev/md5 of=/dev/null bs=1GB skip=8797
> > dd: reading `/dev/md5': Invalid argument
> > 0+0 records in
> > 0+0 records out
> > 0 bytes (0 B) copied, 0.000403171 s, 0.0 kB/s
> > myth:~# dd if=/dev/md5 of=/dev/null bs=1GB skip=8796
> > dd: reading `/dev/md5': Invalid argument
> > 1+0 records in
> > 1+0 records out
> > 1000000000 bytes (1.0 GB) copied, 10.5817 s, 94.5 MB/s
> 
> That has nothing to do with MD.  You are using a power of ten suffix in
> your block size, so you are running into non-aligned sector locations.

not really, I read the whole device from scratch (without skip) and it
read 8.8TB before it failed. It just takes 2 days to run, so it's a bit
annoying to do repeatedly :)

myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GB skip=8790
dd: reading `/dev/md5': Invalid argument
7+0 records in
7+0 records out
7000000000 bytes (7.0 GB) copied, 76.7736 s, 91.2 MB/s

It doesn't matter where I start, it fails exactly in the same place, and
I can't skip over it, anything after that mark is unreadable.

I can switch to GiB if you'd like, same thing:
myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
dd: reading `/dev/md5': Invalid argument
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 21.9751 s, 97.7 MB/s
myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8200
dd: reading `/dev/md5': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000281885 s, 0.0 kB/s
myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8500
dd: reading `/dev/md5': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000395691 s, 0.0 kB/s

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 18:50                 ` Marc MERLIN
@ 2016-11-04 18:59                   ` Roman Mamedov
  2016-11-04 19:31                     ` Roman Mamedov
  2016-11-04 19:51                     ` Marc MERLIN
  0 siblings, 2 replies; 26+ messages in thread
From: Roman Mamedov @ 2016-11-04 18:59 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Phil Turmel, Neil Brown, Andreas Klauer, linux-raid

On Fri, 4 Nov 2016 11:50:40 -0700
Marc MERLIN <marc@merlins.org> wrote:

> I can switch to GiB if you'd like, same thing:
> myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
> dd: reading `/dev/md5': Invalid argument
> 2+0 records in
> 2+0 records out
> 2147483648 bytes (2.1 GB) copied, 21.9751 s, 97.7 MB/s

But now you cansee the cutoff point is exactly at 8192 -- a strangely familiar
number, much more so than "8.8 TB", right? :D

Could you recheck (and post) your mdadm --detail /dev/md5, if the whole array
didn't get cut to a half of its size in "Array Size".

Or maybe the remove bad block list code has some overflow bug which cuts each
device size to 2048 GiB, without the array size reflecting that. You run RAID5
of five members, (5-1)*2048 would give you exactly 8192 GiB.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 18:59                   ` Roman Mamedov
@ 2016-11-04 19:31                     ` Roman Mamedov
  2016-11-04 20:02                       ` Marc MERLIN
  2016-11-04 19:51                     ` Marc MERLIN
  1 sibling, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2016-11-04 19:31 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Phil Turmel, Neil Brown, Andreas Klauer, linux-raid

On Fri, 4 Nov 2016 23:59:17 +0500
Roman Mamedov <rm@romanrm.net> wrote:

> On Fri, 4 Nov 2016 11:50:40 -0700
> Marc MERLIN <marc@merlins.org> wrote:
> 
> > I can switch to GiB if you'd like, same thing:
> > myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
> > dd: reading `/dev/md5': Invalid argument
> > 2+0 records in
> > 2+0 records out
> > 2147483648 bytes (2.1 GB) copied, 21.9751 s, 97.7 MB/s
> 
> But now you cansee the cutoff point is exactly at 8192 -- a strangely familiar
> number, much more so than "8.8 TB", right? :D
> 
> Could you recheck (and post) your mdadm --detail /dev/md5, if the whole array
> didn't get cut to a half of its size in "Array Size".

Also check that member devices of /dev/md5 (/dev/sd*1 partitions) are still
larger than 2TB, and are still readable past 2TB.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 19:31                     ` Roman Mamedov
@ 2016-11-04 20:02                       ` Marc MERLIN
  0 siblings, 0 replies; 26+ messages in thread
From: Marc MERLIN @ 2016-11-04 20:02 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Phil Turmel, Neil Brown, Andreas Klauer, linux-raid

On Sat, Nov 05, 2016 at 12:31:09AM +0500, Roman Mamedov wrote:
> Also check that member devices of /dev/md5 (/dev/sd*1 partitions) are still
> larger than 2TB, and are still readable past 2TB.

Just for my own sanity: if the drives had rear errors, those would be
logged by the kernel, right?
I did run hdrecover on all those drives and it completed on all of them
(I did that first before checking anything else)

Here's me reading 1GB from each drive at the 3.5TB mark:
myth:/sys/block/md5/md# for i in /dev/sd[defgh]; do dd if=$i of=/dev/null bs=1GiB skip=3500 count=1; done
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 77.4343 s, 13.9 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 50.1179 s, 21.4 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 39.6499 s, 27.1 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 71.6397 s, 15.0 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 73.1003 s, 14.7 MB/s

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 18:59                   ` Roman Mamedov
  2016-11-04 19:31                     ` Roman Mamedov
@ 2016-11-04 19:51                     ` Marc MERLIN
  2016-11-07  0:16                       ` NeilBrown
  1 sibling, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-11-04 19:51 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Phil Turmel, Neil Brown, Andreas Klauer, linux-raid

On Fri, Nov 04, 2016 at 11:59:17PM +0500, Roman Mamedov wrote:
> On Fri, 4 Nov 2016 11:50:40 -0700
> Marc MERLIN <marc@merlins.org> wrote:
> 
> > I can switch to GiB if you'd like, same thing:
> > myth:/dev# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
> > dd: reading `/dev/md5': Invalid argument
> > 2+0 records in
> > 2+0 records out
> > 2147483648 bytes (2.1 GB) copied, 21.9751 s, 97.7 MB/s
> 
> But now you cansee the cutoff point is exactly at 8192 -- a strangely familiar
> number, much more so than "8.8 TB", right? :D
 
Yes, that's a valid point :)

> Could you recheck (and post) your mdadm --detail /dev/md5, if the whole array
> didn't get cut to a half of its size in "Array Size".

I just posted it in my previous Email:
myth:~# mdadm --query --detail /dev/md5

/dev/md5:
        Version : 1.2
  Creation Time : Tue Jan 21 10:35:52 2014
     Raid Level : raid5
     Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Oct 31 07:56:07 2016
          State : clean

(more in the previous Email)

> Or maybe the remove bad block list code has some overflow bug which cuts each
> device size to 2048 GiB, without the array size reflecting that. You run RAID5
> of five members, (5-1)*2048 would give you exactly 8192 GiB.

that's very possible too.
So even though the array is marked clean and I don't care if some md
blocks return data that is actually corrupt as long as the read succeeds
(my filesystem will sort that out), I figured I could try a repair.

What's interesting is that it started exactly at 50%, which is also
likely where my reads were failing.

myth:/sys/block/md5/md# echo repair > sync_action 

md5 : active raid5 sdg1[0] sdd1[5] sde1[3] sdf1[2] sdh1[6]
      15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      [==========>..........]  resync = 50.0% (1953925916/3906885632) finish=1899.1min speed=17138K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

That said, as this resync is processing, I'd think/hope it would move
the error forward, but it does not seem to:
myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
dd: reading `/dev/md5': Invalid argument
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 27.8491 s, 77.1 MB/s

So basically I'm stuck in the same place, and it seems that I've found
an actual swraid bug in the kernel and I'm not hopeful that the problem
will be fixed after the resync completes.

If someone wants me to try stuff before I wipe it all and restart, let
me know, but otherwise I've been in this broken state for 3 weeks now
and I need to fix it so that I can restart my backups again.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-04 19:51                     ` Marc MERLIN
@ 2016-11-07  0:16                       ` NeilBrown
  2016-11-07  1:13                         ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2016-11-07  0:16 UTC (permalink / raw)
  To: Marc MERLIN, Roman Mamedov
  Cc: Phil Turmel, Neil Brown, Andreas Klauer, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1652 bytes --]

On Sat, Nov 05 2016, Marc MERLIN wrote:
>
> What's interesting is that it started exactly at 50%, which is also
> likely where my reads were failing.
>
> myth:/sys/block/md5/md# echo repair > sync_action 
>
> md5 : active raid5 sdg1[0] sdd1[5] sde1[3] sdf1[2] sdh1[6]
>       15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>       [==========>..........]  resync = 50.0% (1953925916/3906885632) finish=1899.1min speed=17138K/sec
>       bitmap: 0/30 pages [0KB], 65536KB chunk

Yep, that is weird.

You can cause that to happen by e.g
   echo 7813771264 > /sys/block/md5/md/sync_min

but you are unlikely to have done that deliberately.


>
> That said, as this resync is processing, I'd think/hope it would move
> the error forward, but it does not seem to:
> myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
> dd: reading `/dev/md5': Invalid argument
> 2+0 records in
> 2+0 records out
> 2147483648 bytes (2.1 GB) copied, 27.8491 s, 77.1 MB/s

EINVAL from a read() system call is surprising in this context.....

do_generic_file_read can return it:
	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
		return -EINVAL;

s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is

#define MAX_LFS_FILESIZE        (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

That is 2^(12+31) or 2^43 or 8TB.

Is this a 32bit system you are using?  Such systems can only support
buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
should get access to the whole device.

If this is a 64bit system, then the problem must be elsewhere.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-07  0:16                       ` NeilBrown
@ 2016-11-07  1:13                         ` Marc MERLIN
  2016-11-07  3:36                           ` Phil Turmel
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2016-11-07  1:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: Roman Mamedov, Phil Turmel, Neil Brown, Andreas Klauer,
	linux-raid

On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote:
> On Sat, Nov 05 2016, Marc MERLIN wrote:
> >
> > What's interesting is that it started exactly at 50%, which is also
> > likely where my reads were failing.
> >
> > myth:/sys/block/md5/md# echo repair > sync_action 
> >
> > md5 : active raid5 sdg1[0] sdd1[5] sde1[3] sdf1[2] sdh1[6]
> >       15627542528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
> >       [==========>..........]  resync = 50.0% (1953925916/3906885632) finish=1899.1min speed=17138K/sec
> >       bitmap: 0/30 pages [0KB], 65536KB chunk
> 
> Yep, that is weird.
> 
> You can cause that to happen by e.g
>    echo 7813771264 > /sys/block/md5/md/sync_min
> 
> but you are unlikely to have done that deliberately.

I might have done this by mistake instead of sync_speed_min, but as you
say, unlikely. Then again, this is not the main problem and I think you
did find the reason below.

> s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is
> 
> #define MAX_LFS_FILESIZE        (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
> 
> That is 2^(12+31) or 2^43 or 8TB.
> 
> Is this a 32bit system you are using?  Such systems can only support
> buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
> should get access to the whole device.

You found the problem, and you also found the reason why btrfs_tools
also fails past 8GB. It is indeed a 32bit distro. If I put a 64bit
kernel with the 32bit userland, there is a weird problem with a sound
driver/video driver sync, so I've stuck with 32bits.

This also explains why my btrfs filesystem mounts perfectly because the
kernel knows how to deal with it, but as soon as I use btrfs check
(32bits), it fails to access data past the 8TB limit, and falls on its
face too.
myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
dd: reading `/dev/md5': Invalid argument
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 37.0785 s, 57.9 MB/s
myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 count=3 iflag=direct
3+0 records in
3+0 records out
3221225472 bytes (3.2 GB) copied, 41.0663 s, 78.4 MB/s

So a big thanks for solving this mystery.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-07  1:13                         ` Marc MERLIN
@ 2016-11-07  3:36                           ` Phil Turmel
  0 siblings, 0 replies; 26+ messages in thread
From: Phil Turmel @ 2016-11-07  3:36 UTC (permalink / raw)
  To: Marc MERLIN, NeilBrown
  Cc: Roman Mamedov, Neil Brown, Andreas Klauer, linux-raid

On 11/06/2016 08:13 PM, Marc MERLIN wrote:
> On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote:

>> Is this a 32bit system you are using?  Such systems can only support
>> buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
>> should get access to the whole device.
> 
> You found the problem, and you also found the reason why btrfs_tools
> also fails past 8GB. It is indeed a 32bit distro. If I put a 64bit
> kernel with the 32bit userland, there is a weird problem with a sound
> driver/video driver sync, so I've stuck with 32bits.

Huh.  Learn something new every day, I suppose.  Never would have
thought of this.  Thanks, Neil.

Phil

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
  2016-10-30 16:34       ` Phil Turmel
  2016-10-30 17:12         ` clearing blocks wrongfully marked as bad if --update=no-bbl can't be used? Marc MERLIN
@ 2016-10-30 18:56         ` TomK
  2016-10-30 19:16           ` TomK
                             ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: TomK @ 2016-10-30 18:56 UTC (permalink / raw)
  To: linux-raid

Hey Guy's,

We recently saw a situation where smartctl -A errored out eventually in 
a short time of a few days the disk cascaded into bad blocks eventually 
becoming a completely unrecognizable SATA disk.  It apparently was 
limping along for 6 months causing random timeout and slowdowns 
accessing the array.  But the RAID array did not pull it out or and did 
not mark it as bad.  The RAID 6 we have has been running for 6 years, 
however we did have alot of disk replacements in it yet it was always 
very very reliable.  Disks started as all 1TB Seagates but are now 2 WD 
2TB, 1 2TB Seagate with 2 left as 1TB Seagates and the last one as 
1.5TB.  Has a mix of green, red, blue etc.  Yet very rock solid.

We did not do a thorough R/W test to see how the error and bad disk 
affected the data stored on the array but did notice pauses and 
slowdowns on the CIFS share presented from it with pauses and generally 
difficulty in reading data, however no data errors that we could see. 
Since then we replaced the 2TB Seagate with a new 2TB WD and everything 
is fine even if the array is degraded.  But as soon as we put in this 
bad disk, it degraded to it's previous behaviour.  Yet the array didn't 
catch it as a failed disk until the disk was nearly completely 
inaccessible.

So the question is how come the mdadm RAID did not catch this disk as a 
failed disk and pull it out of the array?  Seams this disk was going bad 
for a while now but as long as the array reported all 6 healthy, there 
was no cause for alarm.  Also how does the array not detect the disk 
failure while issues in applications using the array show up?  Removing 
the disk and leaving the array in a degraded state also solved the 
accessibility issue on the array.  So appears the disk was generating 
some sort of errors (Possibly bad PCB) that were not caught before.

Looking at the changelogs, has a similar case been addressed?

On a separate topic, if I eventually expand the array to 6 2TB disks, 
will the array be smart enough to allow me to expand it to the new size? 
  Have not tried that yet and wanted to ask first.

Cheers,
Tom

[root@mbpc-pc modprobe.d]# rpm -qf /sbin/mdadm
mdadm-3.3.2-5.el6.x86_64
[root@mbpc-pc modprobe.d]#

(The 100% util lasts roughly 30 seconds)
10/23/2016 10:18:20 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0.00    0.00    0.25   25.19    0.00   74.56

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    1.00     0.00     2.50 5.00 
   0.03   27.00  27.00   2.70
sdc               0.00     0.00    0.00    1.00     0.00     2.50 5.00 
   0.01   15.00  15.00   1.50
sdd               0.00     0.00    0.00    1.00     0.00     2.50 5.00 
   0.02   18.00  18.00   1.80
sde               0.00     0.00    0.00    1.00     0.00     2.50 5.00 
   0.02   23.00  23.00   2.30
sdf               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   1.15    0.00   0.00 100.00
sdg               0.00     2.00    1.00    4.00     4.00   172.00 70.40 
    0.04    8.40   2.80   1.40
sda               0.00     0.00    0.00    1.00     0.00     2.50 5.00 
   0.04   37.00  37.00   3.70
sdh               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
fd0               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-0              0.00     0.00    1.00    6.00     4.00   172.00 50.29 
    0.05    7.29   2.00   1.40
dm-1              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   1.00    0.00   0.00 100.00

10/23/2016 10:18:21 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0.00    0.00    0.25   24.81    0.00   74.94

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   2.00    0.00   0.00 100.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
fd0               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00 0.00 
   1.00    0.00   0.00 100.00

We can see that /dev/sdf ramps up to 100% starting at around (10/23/2016 
10:18:18 PM) and stays that way till about the (10/23/2016 10:18:42 PM) 
mark when something occurs and it drops down to below 100% numbers.

So I checked the array which shows all clean, even across reboots:

[root@mbpc-pc ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb[7] sdf[6] sdd[3] sda[5] sdc[1] sde[8]
       3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] 
[UUUUUU]
       bitmap: 1/8 pages [4KB], 65536KB chunk

unused devices: <none>
[root@mbpc-pc ~]#

Then I run smartctl across all disks and sure enough /dev/sdf prints this:

[root@mbpc-pc ~]# smartctl -A /dev/sdf
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-4.8.4] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Error SMART Values Read failed: scsi error badly formed scsi parameters
Smartctl: SMART Read Values failed.

=== START OF READ SMART DATA SECTION ===
[root@mbpc-pc ~]#

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
  2016-10-30 18:56         ` [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds TomK
@ 2016-10-30 19:16           ` TomK
  2016-10-30 20:13           ` Andreas Klauer
  2016-10-31 19:29           ` Wols Lists
  2 siblings, 0 replies; 26+ messages in thread
From: TomK @ 2016-10-30 19:16 UTC (permalink / raw)
  To: linux-raid

On 10/30/2016 2:56 PM, TomK wrote:
> Hey Guy's,
>
> We recently saw a situation where smartctl -A errored out eventually in
> a short time of a few days the disk cascaded into bad blocks eventually
> becoming a completely unrecognizable SATA disk.  It apparently was
> limping along for 6 months causing random timeout and slowdowns
> accessing the array.  But the RAID array did not pull it out or and did
> not mark it as bad.  The RAID 6 we have has been running for 6 years,
> however we did have alot of disk replacements in it yet it was always
> very very reliable.  Disks started as all 1TB Seagates but are now 2 WD
> 2TB, 1 2TB Seagate with 2 left as 1TB Seagates and the last one as
> 1.5TB.  Has a mix of green, red, blue etc.  Yet very rock solid.
>
> We did not do a thorough R/W test to see how the error and bad disk
> affected the data stored on the array but did notice pauses and
> slowdowns on the CIFS share presented from it with pauses and generally
> difficulty in reading data, however no data errors that we could see.
> Since then we replaced the 2TB Seagate with a new 2TB WD and everything
> is fine even if the array is degraded.  But as soon as we put in this
> bad disk, it degraded to it's previous behaviour.  Yet the array didn't
> catch it as a failed disk until the disk was nearly completely
> inaccessible.
>
> So the question is how come the mdadm RAID did not catch this disk as a
> failed disk and pull it out of the array?  Seams this disk was going bad
> for a while now but as long as the array reported all 6 healthy, there
> was no cause for alarm.  Also how does the array not detect the disk
> failure while issues in applications using the array show up?  Removing
> the disk and leaving the array in a degraded state also solved the
> accessibility issue on the array.  So appears the disk was generating
> some sort of errors (Possibly bad PCB) that were not caught before.
>
> Looking at the changelogs, has a similar case been addressed?
>
> On a separate topic, if I eventually expand the array to 6 2TB disks,
> will the array be smart enough to allow me to expand it to the new size?
>  Have not tried that yet and wanted to ask first.
>
> Cheers,
> Tom
>
>
> [root@mbpc-pc modprobe.d]# rpm -qf /sbin/mdadm
> mdadm-3.3.2-5.el6.x86_64
> [root@mbpc-pc modprobe.d]#
>
>
> (The 100% util lasts roughly 30 seconds)
> 10/23/2016 10:18:20 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    0.25   25.19    0.00   74.56
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    1.00     0.00     2.50 5.00
> 0.03   27.00  27.00   2.70
> sdc               0.00     0.00    0.00    1.00     0.00     2.50 5.00
> 0.01   15.00  15.00   1.50
> sdd               0.00     0.00    0.00    1.00     0.00     2.50 5.00
> 0.02   18.00  18.00   1.80
> sde               0.00     0.00    0.00    1.00     0.00     2.50 5.00
> 0.02   23.00  23.00   2.30
> sdf               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 1.15    0.00   0.00 100.00
> sdg               0.00     2.00    1.00    4.00     4.00   172.00 70.40
>    0.04    8.40   2.80   1.40
> sda               0.00     0.00    0.00    1.00     0.00     2.50 5.00
> 0.04   37.00  37.00   3.70
> sdh               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdj               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdk               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdi               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> fd0               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-0              0.00     0.00    1.00    6.00     4.00   172.00 50.29
>    0.05    7.29   2.00   1.40
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-2              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> md0               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-3              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-4              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-5              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-6              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 1.00    0.00   0.00 100.00
>
> 10/23/2016 10:18:21 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    0.25   24.81    0.00   74.94
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdc               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdd               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sde               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdf               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 2.00    0.00   0.00 100.00
> sdg               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sda               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdh               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdj               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdk               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> sdi               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> fd0               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-2              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> md0               0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-3              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-4              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-5              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 0.00    0.00   0.00   0.00
> dm-6              0.00     0.00    0.00    0.00     0.00     0.00 0.00
> 1.00    0.00   0.00 100.00
>
>
> We can see that /dev/sdf ramps up to 100% starting at around (10/23/2016
> 10:18:18 PM) and stays that way till about the (10/23/2016 10:18:42 PM)
> mark when something occurs and it drops down to below 100% numbers.
>
> So I checked the array which shows all clean, even across reboots:
>
> [root@mbpc-pc ~]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sdb[7] sdf[6] sdd[3] sda[5] sdc[1] sde[8]
>       3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6]
> [UUUUUU]
>       bitmap: 1/8 pages [4KB], 65536KB chunk
>
> unused devices: <none>
> [root@mbpc-pc ~]#
>
>
> Then I run smartctl across all disks and sure enough /dev/sdf prints this:
>
> [root@mbpc-pc ~]# smartctl -A /dev/sdf
> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-4.8.4] (local build)
> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Error SMART Values Read failed: scsi error badly formed scsi parameters
> Smartctl: SMART Read Values failed.
>
> === START OF READ SMART DATA SECTION ===
> [root@mbpc-pc ~]#
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bit trigger happy.  Here's a better version of the first sentence.  :)

We recently saw a situation where smartctl -A errored out but mdadm 
didn't pick this up. Eventually, in a short time of a few days, the disk 
cascaded into bad blocks then became a completely unrecognizable SATA disk.

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
  2016-10-30 18:56         ` [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds TomK
  2016-10-30 19:16           ` TomK
@ 2016-10-30 20:13           ` Andreas Klauer
  2016-10-30 21:08             ` TomK
  2016-10-31 19:29           ` Wols Lists
  2 siblings, 1 reply; 26+ messages in thread
From: Andreas Klauer @ 2016-10-30 20:13 UTC (permalink / raw)
  To: TomK; +Cc: linux-raid

On Sun, Oct 30, 2016 at 02:56:58PM -0400, TomK wrote:
> So the question is how come the mdadm RAID did not catch this disk as a 
> failed disk and pull it out of the array?

RAID doesn't know about SMART. It's that simple.

If SMART already knows about errors - too bad, RAID doesn't care.
It also doesn't know about anything else really. You ddrescue the 
member disk directly and it finds tons of errors... RAID isn't involved.

RAID will only kick when it by itself stumbles over an error that does 
not go away when rewriting data. Or when the drive just doesn't respond 
anymore for an extended period of time. And that timeout is per request 
so a bad disk can grind the entire system to a halt without ever kicked.

ddrescue has this nice --min-read-rate option, any zone that yields data
slower will be considered a hopeless case, RAID does not have such magic. 
If your drive always responds and always claims to successfully write 
even when it doesn't, then RAID will never kick it.

If you never run array checks or smart selftests, errors won't show.
RAID will show them as healthy, SMART will show them as healthy, 
doesn't mean diddly-squat until you actually test it. Regularly.

Kicking drives yourself is quite normal. RAID only does so much. 
This is why we have mdadm --replace, that way even a semi-broken disk 
can help with the rebuild effort and bad sectors on other disks won't 
result in an even bigger problem, or at least, not right away.

If you leave RAID to its own devices, it has a much higher chance of dying 
than if you run tests, and actually decide to do something once *you're* 
aware that there are problems that RAID itself isn't aware of.

> On a separate topic, if I eventually expand the array to 6 2TB disks, 
> will the array be smart enough to allow me to expand it to the new size? 

Yes. Perhaps after an additional --grow --size=max.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
  2016-10-30 20:13           ` Andreas Klauer
@ 2016-10-30 21:08             ` TomK
  0 siblings, 0 replies; 26+ messages in thread
From: TomK @ 2016-10-30 21:08 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

On 10/30/2016 4:13 PM, Andreas Klauer wrote:
> On Sun, Oct 30, 2016 at 02:56:58PM -0400, TomK wrote:
>> So the question is how come the mdadm RAID did not catch this disk as a
>> failed disk and pull it out of the array?
>
> RAID doesn't know about SMART. It's that simple.
>
> If SMART already knows about errors - too bad, RAID doesn't care.
> It also doesn't know about anything else really. You ddrescue the
> member disk directly and it finds tons of errors... RAID isn't involved.
>
> RAID will only kick when it by itself stumbles over an error that does
> not go away when rewriting data. Or when the drive just doesn't respond
> anymore for an extended period of time. And that timeout is per request
> so a bad disk can grind the entire system to a halt without ever kicked.
>
> ddrescue has this nice --min-read-rate option, any zone that yields data
> slower will be considered a hopeless case, RAID does not have such magic.
> If your drive always responds and always claims to successfully write
> even when it doesn't, then RAID will never kick it.
>
> If you never run array checks or smart selftests, errors won't show.
> RAID will show them as healthy, SMART will show them as healthy,
> doesn't mean diddly-squat until you actually test it. Regularly.
>
> Kicking drives yourself is quite normal. RAID only does so much.
> This is why we have mdadm --replace, that way even a semi-broken disk
> can help with the rebuild effort and bad sectors on other disks won't
> result in an even bigger problem, or at least, not right away.
>
> If you leave RAID to its own devices, it has a much higher chance of dying
> than if you run tests, and actually decide to do something once *you're*
> aware that there are problems that RAID itself isn't aware of.
>
>> On a separate topic, if I eventually expand the array to 6 2TB disks,
>> will the array be smart enough to allow me to expand it to the new size?
>
> Yes. Perhaps after an additional --grow --size=max.
>
> Regards
> Andreas Klauer
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Very clear. Thanks Andreas!

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
  2016-10-30 18:56         ` [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds TomK
  2016-10-30 19:16           ` TomK
  2016-10-30 20:13           ` Andreas Klauer
@ 2016-10-31 19:29           ` Wols Lists
  2016-11-01  2:40             ` TomK
  2 siblings, 1 reply; 26+ messages in thread
From: Wols Lists @ 2016-10-31 19:29 UTC (permalink / raw)
  To: TomK, linux-raid

On 30/10/16 18:56, TomK wrote:
> 
> We did not do a thorough R/W test to see how the error and bad disk
> affected the data stored on the array but did notice pauses and
> slowdowns on the CIFS share presented from it with pauses and generally
> difficulty in reading data, however no data errors that we could see.
> Since then we replaced the 2TB Seagate with a new 2TB WD and everything
> is fine even if the array is degraded.  But as soon as we put in this
> bad disk, it degraded to it's previous behaviour.  Yet the array didn't
> catch it as a failed disk until the disk was nearly completely
> inaccessible.

What is this 2TB Seagate? A Barracuda? There's your problem, quite
possibly. Sounds like you've got your timeouts correctly matched, so
this drive is responding, but taking ages to do so. And that's why it
doesn't get kicked, but it knackers system response times - the kernel
is correctly configured to wait for the geriatric to respond.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
  2016-10-31 19:29           ` Wols Lists
@ 2016-11-01  2:40             ` TomK
  0 siblings, 0 replies; 26+ messages in thread
From: TomK @ 2016-11-01  2:40 UTC (permalink / raw)
  To: Wols Lists, linux-raid

On 10/31/2016 3:29 PM, Wols Lists wrote:
> On 30/10/16 18:56, TomK wrote:
>>
>> We did not do a thorough R/W test to see how the error and bad disk
>> affected the data stored on the array but did notice pauses and
>> slowdowns on the CIFS share presented from it with pauses and generally
>> difficulty in reading data, however no data errors that we could see.
>> Since then we replaced the 2TB Seagate with a new 2TB WD and everything
>> is fine even if the array is degraded.  But as soon as we put in this
>> bad disk, it degraded to it's previous behaviour.  Yet the array didn't
>> catch it as a failed disk until the disk was nearly completely
>> inaccessible.
>
> What is this 2TB Seagate? A Barracuda? There's your problem, quite
> possibly. Sounds like you've got your timeouts correctly matched, so
> this drive is responding, but taking ages to do so. And that's why it
> doesn't get kicked, but it knackers system response times - the kernel
> is correctly configured to wait for the geriatric to respond.
>
> Cheers,
> Wol
>

Hey Wols,

It's about a 2-3 year old Seagate but not a Barracuda.  They did not 
come with high ratings back then.  I also do adjust other recommended 
settings like write caches etc.

With the previous answer provided by Andreas, I got a very good picture 
what scope of issues RAID should cover and what is not.

So rightly so there is a gap where RAID will not cover all disk failures 
while the disk may impact the applications sitting on top of the array.

Where I was going with this as well is to help me identify what other 
tools I may need in solutions that use RAID.  In this case the answer 
Andreas provided tells me I have to have specific software for disk 
monitoring to the array that would tell me potential issues ahead of 
time alongside the RAID.

On a side note, I like to see the RAID mailing lists so busy.  If I were 
to read the various blog posts, I would believe RAID died 5 years ago.  :)

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30 16:19     ` Andreas Klauer
  2016-10-30 16:34       ` Phil Turmel
@ 2016-10-30 16:43       ` Marc MERLIN
  2016-10-30 17:02         ` Andreas Klauer
  2016-10-31 19:24         ` Wols Lists
  1 sibling, 2 replies; 26+ messages in thread
From: Marc MERLIN @ 2016-10-30 16:43 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

On Sun, Oct 30, 2016 at 05:19:29PM +0100, Andreas Klauer wrote:
> On Sun, Oct 30, 2016 at 08:38:57AM -0700, Marc MERLIN wrote:
> > (mmmh, but even so, rebuilding the spare should have cleared the bad blocks
> > on at least one drive, no?)
> 
> If n+1 disks have bad blocks there's no data to sync over, so they just 
> propagate and stay bad forever. Or at least that's how it seemed to work 
> last time I tried it. I'm not familiar with bad blocks. I just turn it off.
> 
> As long as the bad block list is empty you can --update=no-bbl.
> If everything else fails - edit the metadata or carefully recreate.
> Which I don't recommend because you can go wrong in a hundred ways.

Right.
There should be some --update=no-bbl --force if the admin knows the bad
block list is wrong and due to IO issues not related to the drive.

> I don't remember if anyone ever had a proper solution to this.
> It came up a couple of times on the list so you could search.

Will look, thanks.

> If you've replaced drives since, the drive that has been part of the array 
> the longest is probably the most likely to still have valid data in there. 
> That could be synced over to the other drives once the bbl is cleared. 
> It might not matter, you'd have to check with your filesystems if they 
> believe any files located there. (Filesystems sometimes maintain their 
> own bad block lists so you'd have to check those too.)

No drives were ever replaced, this is an original array used only a few
times (for backups).
At this point I'm almost tempted to wipe and start over, but it's going to
take a week to recreate the backup (lots of data, slow link).
As for the filesystem it's btrfs with data and metadata checksums, so it's
easy to verify that everything is fine once I can get md5 to stop returning
IO errors on blocks it thinks are bad, but in fact are not.

And here isn't one good drive between the 2, the bad blocks are identical on
both drives and must have happened at the same time due to those cable
induced IO errors I mentionned.
Too bad that mdadm doesn't seem to account for the fact that it could be
wrong when marking blocks as bad and does not seem to give a way to recover
from this easily....
I'll do more reading, thanks.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30 16:43       ` Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
@ 2016-10-30 17:02         ` Andreas Klauer
  2016-10-31 19:24         ` Wols Lists
  1 sibling, 0 replies; 26+ messages in thread
From: Andreas Klauer @ 2016-10-30 17:02 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-raid

On Sun, Oct 30, 2016 at 09:43:42AM -0700, Marc MERLIN wrote:
> Right.
> There should be some --update=no-bbl --force if the admin knows the bad
> block list is wrong and due to IO issues not related to the drive.

Good point. And hey, there it is.

mdadm.c

|                       	if (strcmp(c.update, "bbl") == 0)
|                               	continue;
|                       	if (strcmp(c.update, "no-bbl") == 0)
|                                continue;
|                       	if (strcmp(c.update, "force-no-bbl") == 0)
|                               	continue;

force-no-bbl. It's in mdadm v3.4, not sure about older ones.

If I stumbled across that one before then I forgot about it.

Good luck
Andreas Klauer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Buffer I/O error on dev md5, logical block 7073536, async page read
  2016-10-30 16:43       ` Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
  2016-10-30 17:02         ` Andreas Klauer
@ 2016-10-31 19:24         ` Wols Lists
  1 sibling, 0 replies; 26+ messages in thread
From: Wols Lists @ 2016-10-31 19:24 UTC (permalink / raw)
  To: Marc MERLIN, Andreas Klauer; +Cc: linux-raid

On 30/10/16 16:43, Marc MERLIN wrote:
> And here isn't one good drive between the 2, the bad blocks are identical on
> both drives and must have happened at the same time due to those cable
> induced IO errors I mentionned.
> Too bad that mdadm doesn't seem to account for the fact that it could be
> wrong when marking blocks as bad and does not seem to give a way to recover
> from this easily....
> I'll do more reading, thanks.

Reading the list, I've picked up that somehow badblocks seem to get
propagated from one drive to another. So if one drive gets a badblock,
that seems to get marked as bad on other drives too :-(

Oh - and as for badblocks being obsolete, isn't there a load of work
being done on it at the moment? For hardware raid I believe, which
presumably does not handle badblocks the way Phil thinks all modern
drives do? (Not surprising - hardware raid is regularly slated for being
buggy and not a good idea, this is probably more of the same...)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-11-07  3:36 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-30  2:16 Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
2016-10-30  9:33 ` Andreas Klauer
2016-10-30 15:38   ` Marc MERLIN
2016-10-30 16:19     ` Andreas Klauer
2016-10-30 16:34       ` Phil Turmel
2016-10-30 17:12         ` clearing blocks wrongfully marked as bad if --update=no-bbl can't be used? Marc MERLIN
2016-10-30 17:16           ` Marc MERLIN
2016-11-04 18:18             ` Marc MERLIN
2016-11-04 18:22               ` Phil Turmel
2016-11-04 18:50                 ` Marc MERLIN
2016-11-04 18:59                   ` Roman Mamedov
2016-11-04 19:31                     ` Roman Mamedov
2016-11-04 20:02                       ` Marc MERLIN
2016-11-04 19:51                     ` Marc MERLIN
2016-11-07  0:16                       ` NeilBrown
2016-11-07  1:13                         ` Marc MERLIN
2016-11-07  3:36                           ` Phil Turmel
2016-10-30 18:56         ` [ LR] Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds TomK
2016-10-30 19:16           ` TomK
2016-10-30 20:13           ` Andreas Klauer
2016-10-30 21:08             ` TomK
2016-10-31 19:29           ` Wols Lists
2016-11-01  2:40             ` TomK
2016-10-30 16:43       ` Buffer I/O error on dev md5, logical block 7073536, async page read Marc MERLIN
2016-10-30 17:02         ` Andreas Klauer
2016-10-31 19:24         ` Wols Lists

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).