Troubleshooting "Buffer I/O error" on reading md device

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Troubleshooting "Buffer I/O error" on reading md device
@ 2018-01-02  2:46 RQM
  2018-01-02  3:13 ` Reindl Harald
  2018-01-02  4:28 ` NeilBrown
  0 siblings, 2 replies; 12+ messages in thread
From: RQM @ 2018-01-02  2:46 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

Hello everyone,

I hope this list is the right place to ask the following:

I've got a 5-disk RAID-5 array that's been built by a QNAP NAS device, which has recently failed (I suspect a faulty SATA controller or backplane).
I migrated the disks to a desktop computer that runs Debian stretch (kernel 4.9.65-3+deb9u1 amd64) and mdadm version 3.4. Although the array can be assembled, I encountered the following error in my dmesg output ([1], recorded directly after a recent reboot and fsck attempt) when running fsck:

Buffer I/O error on dev md0, logical block 1598030208, async page read

I can reliably reproduce that error by trying to read from the md0 device. It's always the same block, also across reboots.

I have suspected that possibly, one of the drives involved is faulty. Although smart errors have been logged [2], the errors are not recent enough to correlate with the fsck run. Also, I had sha1sum complete without error on every one of the individual disk devices /dev/sd[b-f], so reading from the drives does not provoke an error.

Finally, I tried scrubbing the array by writing repair to md/sync_action. The process completed without any output to dmesg or signs of trouble in /proc/mdstat. However, reading from the array still fails at the same block as above, 1598030208.

Here's the output of mdadm --detail /dev/md0: [3]

I assume the md driver would know what exactly the problem is, but I don't know where to look to find that information. How can I proceed troubleshooting this issue?

FYI, I had posted this on serverfault [4] previously, but unfortunately didn't arrive at a conclusion.

Thank you very much in advance!

[1] https://paste.ubuntu.com/26303735/
[2] https://paste.ubuntu.com/26303737/
[3] https://paste.ubuntu.com/26303754/
[4] https://serverfault.com/questions/889687/troubleshooting-buffer-i-o-error-on-software-raid-md-device

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-02  2:46 Troubleshooting "Buffer I/O error" on reading md device RQM
@ 2018-01-02  3:13 ` Reindl Harald
  2018-01-02  4:28 ` NeilBrown
  1 sibling, 0 replies; 12+ messages in thread
From: Reindl Harald @ 2018-01-02  3:13 UTC (permalink / raw)
  To: RQM, linux-raid@vger.kernel.org



Am 02.01.2018 um 03:46 schrieb RQM:
> I hope this list is the right place to ask the following:
> 
> I've got a 5-disk RAID-5 array that's been built by a QNAP NAS device, which has recently failed (I suspect a faulty SATA controller or backplane).
> I migrated the disks to a desktop computer that runs Debian stretch (kernel 4.9.65-3+deb9u1 amd64) and mdadm version 3.4. Although the array can be assembled, I encountered the following error in my dmesg output ([1], recorded directly after a recent reboot and fsck attempt) when running fsck:
> 
> Buffer I/O error on dev md0, logical block 1598030208, async page read
> 
> I can reliably reproduce that error by trying to read from the md0 device. It's always the same block, also across reboots

i had the same message on my testserver VM running within VMware 
Workstation after upgrade to one of the first 4.14 kernels on Fedora for 
/dev/sdb1 (rootfs) and it went away as it came

in case of a virtual disk faulty hardware is even impossible or at least 
would i expect such message on the underlying raid10 and not in a random 
guest

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-02  2:46 Troubleshooting "Buffer I/O error" on reading md device RQM
  2018-01-02  3:13 ` Reindl Harald
@ 2018-01-02  4:28 ` NeilBrown
  2018-01-02 10:40   ` RQM
  1 sibling, 1 reply; 12+ messages in thread
From: NeilBrown @ 2018-01-02  4:28 UTC (permalink / raw)
  To: RQM, linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

On Mon, Jan 01 2018, RQM wrote:

> Hello everyone,
>
> I hope this list is the right place to ask the following:
>
> I've got a 5-disk RAID-5 array that's been built by a QNAP NAS device, which has recently failed (I suspect a faulty SATA controller or backplane).
> I migrated the disks to a desktop computer that runs Debian stretch (kernel 4.9.65-3+deb9u1 amd64) and mdadm version 3.4. Although the array can be assembled, I encountered the following error in my dmesg output ([1], recorded directly after a recent reboot and fsck attempt) when running fsck:
>
> Buffer I/O error on dev md0, logical block 1598030208, async page read
>
> I can reliably reproduce that error by trying to read from the md0 device. It's always the same block, also across reboots.
>
> I have suspected that possibly, one of the drives involved is faulty. Although smart errors have been logged [2], the errors are not recent enough to correlate with the fsck run. Also, I had sha1sum complete without error on every one of the individual disk devices /dev/sd[b-f], so reading from the drives does not provoke an error.
>
> Finally, I tried scrubbing the array by writing repair to md/sync_action. The process completed without any output to dmesg or signs of trouble in /proc/mdstat. However, reading from the array still fails at the same block as above, 1598030208.
>
> Here's the output of mdadm --detail /dev/md0: [3]
>
> I assume the md driver would know what exactly the problem is, but I don't know where to look to find that information. How can I proceed troubleshooting this issue?
>
> FYI, I had posted this on serverfault [4] previously, but unfortunately didn't arrive at a conclusion.
>
> Thank you very much in advance!
>
> [1] https://paste.ubuntu.com/26303735/
> [2] https://paste.ubuntu.com/26303737/
> [3] https://paste.ubuntu.com/26303754/
> [4] https://serverfault.com/questions/889687/troubleshooting-buffer-i-o-error-on-software-raid-md-device

This is truly weird.  I'd even go so far as to say that it cannot
possibly happen (but I've been wrong before).

Step one is confirm that it is easy to reproduce.
Does
  dd if=/dev/md0 bs=4K skip=1598030208 count=1 of=/dev/null

trigger the message reliably?
To check that "4K" is the correct blocksize, run
  blockdev --getbsz /dev/md0

use whatever number if gives as 'bs='.

If you cannot reproduce like that, try a larger count and then a smaller
skip with a large count.

Once you can reproduce with minimal IO, do
  echo file:raid5.c +p > /sys/kernel/debug/dynamic_debug/control
  # repeat experiment
  echo file:raid5.c -p > /sys/kernel/debug/dynamic_debug/control

and report the messages that appear in 'dmesg'.
Also report "mdadm -E" of each member device, and kernel version (though
I see that is in the serverfault report :  4.9.30-2+deb9u5).

Then run
  blktrace /dev/md0 /dev/sd[acdef]
in one window while reproducing the error again in another window.
Then interrupt the blktrace.  This will produce several blocktrace*
files.  create a tar.gz of these and put them somewhere that I can get
them - hopefully they won't be too big.

With all this information, I can poke around and will hopefully be able
to explain if fine detail exactly why this cannot possible happen
(unless it turns out that I'm wrong again).

Thanks,
NeilBrown

  

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-02  4:28 ` NeilBrown
@ 2018-01-02 10:40   ` RQM
  2018-01-02 21:27     ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: RQM @ 2018-01-02 10:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid\@vger.kernel.org

Hello,

thanks for the quick and helpful responses! Answers inline:

> Step one is confirm that it is easy to reproduce.
> Does
> dd if=/dev/md0 bs=4K skip=1598030208 count=1 of=/dev/null
>
> trigger the message reliably?
> To check that "4K" is the correct blocksize, run
> blockdev --getbsz /dev/md0
>
> use whatever number if gives as 'bs='.

blockdev does indeed report a blocksize of 4096, and the dd line does reliably trigger
dd: error reading '/dev/md0': Input/output error
and the same line in dmesg as before.

> Once you can reproduce with minimal IO, do
> echo file:raid5.c +p > /sys/kernel/debug/dynamic_debug/control
>repeat experiment
>
>echo file:raid5.c -p > /sys/kernel/debug/dynamic_debug/control
>
> and report the messages that appear in 'dmesg'.

I had to replace the colon with a space in those two lines (otherwise I would get "bash: echo: write error: Invalid argument"), but after that, this is what I got in dmesg:
https://paste.ubuntu.com/26305369/

> Also report "mdadm -E" of each member device, and kernel version (though
> I see that is in the serverfault report :  4.9.30-2+deb9u5).

mdadm -E says: https://paste.ubuntu.com/26305379/
The kernel has been updated between the serverfault post and my first mail to this list to 4.9.65-3+deb9u1. No changes since.

>
> Then run
> blktrace /dev/md0 /dev/sd[acdef]
> in one window while reproducing the error again in another window.
> Then interrupt the blktrace.  This will produce several blocktrace*
> files.  create a tar.gz of these and put them somewhere that I can get
> them - hopefully they won't be too big.

I had to adjust the last blktrace argument to /dev/sd[b-f] since after the last reboot the names of the drives have changed, but here's the output:
https://filebin.ca/3mnjUz1OIXqm/blktrace-out.tar.gz
I also included the blktrace terminal output in there.

Thank you so much for the effort! Please let me know if you need anything.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-02 10:40   ` RQM
@ 2018-01-02 21:27     ` NeilBrown
  2018-01-02 22:30       ` Roger Heflin
  2018-01-04 14:45       ` RQM
  0 siblings, 2 replies; 12+ messages in thread
From: NeilBrown @ 2018-01-02 21:27 UTC (permalink / raw)
  To: RQM; +Cc: linux-raid\@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]

On Tue, Jan 02 2018, RQM wrote:

> Hello,
>
> thanks for the quick and helpful responses! Answers inline:
>
> > Step one is confirm that it is easy to reproduce.
>> Does
>> dd if=/dev/md0 bs=4K skip=1598030208 count=1 of=/dev/null
>>
>> trigger the message reliably?
>> To check that "4K" is the correct blocksize, run
>> blockdev --getbsz /dev/md0
>>
>> use whatever number if gives as 'bs='.
>
>
> blockdev does indeed report a blocksize of 4096, and the dd line does reliably trigger
> dd: error reading '/dev/md0': Input/output error
> and the same line in dmesg as before.
>
>> Once you can reproduce with minimal IO, do
>> echo file:raid5.c +p > /sys/kernel/debug/dynamic_debug/control
>>repeat experiment
>>
>>echo file:raid5.c -p > /sys/kernel/debug/dynamic_debug/control
>>
>> and report the messages that appear in 'dmesg'.
>
> I had to replace the colon with a space in those two lines (otherwise I would get "bash: echo: write error: Invalid argument"), but after that, this is what I got in dmesg:
> https://paste.ubuntu.com/26305369/

[Tue Jan  2 11:14:47 2018] locked=0 uptodate=0 to_read=1 to_write=0 failed=2 failed_num=3,2

So for this stripe. Two devices appear to be failed: 3 and 2.
As the two devices clearly are thought to be working there must be a bad
block recorded.

>
>> Also report "mdadm -E" of each member device, and kernel version (though
>> I see that is in the serverfault report :  4.9.30-2+deb9u5).
>
> mdadm -E says: https://paste.ubuntu.com/26305379/

I needed "mdadm -E" the components of the array, so the partitions
rather than the whole devices. e.g. /dev/sdb1, not /dev/sdb.

This will show a non-empty bad block list on at least two devices.

You can remove the bad block by over-writing it.
  dd if=/dev/zero of=/dev/md0 bs=4K seek=1598030208 count=1
though that might corrupt some file containing the block.

(note "seek" seeks in the output file, "skip" skips over the input
file).

How did the bad block get there?
A possible scenario is:
 - A device fails and is removed from array
 - read error occurs on another device.  Rather than failing the whole
   device, md records that block as bad.
 - failed device is replaced (or found to be a cabling problem) and
   recovered.  Due to the bad block the stripe cannot be recovered,
   so a bad block is recorded in the new device.

If the read error was really a cabling problem, then the original data
might still be there.  If it is, you could recover it and write it back
to the array rather then writing from /dev/zero.
Finding out which file the failed block is part of is probably possible,
but not necessarily easy.  If you want to try, the first step is
reporting what filesystem is on md0.  If it is ext4, then debugfs can
help.  If something else - I don't know.

NeilBrown

 

> The kernel has been updated between the serverfault post and my first mail to this list to 4.9.65-3+deb9u1. No changes since.
>
>>
>> Then run
>> blktrace /dev/md0 /dev/sd[acdef]
>> in one window while reproducing the error again in another window.
>> Then interrupt the blktrace.  This will produce several blocktrace*
>> files.  create a tar.gz of these and put them somewhere that I can get
>> them - hopefully they won't be too big.
>
> I had to adjust the last blktrace argument to /dev/sd[b-f] since after the last reboot the names of the drives have changed, but here's the output:
> https://filebin.ca/3mnjUz1OIXqm/blktrace-out.tar.gz
> I also included the blktrace terminal output in there.
>
> Thank you so much for the effort! Please let me know if you need anything.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-02 21:27     ` NeilBrown
@ 2018-01-02 22:30       ` Roger Heflin
  2018-01-04 14:45       ` RQM
  1 sibling, 0 replies; 12+ messages in thread
From: Roger Heflin @ 2018-01-02 22:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: RQM, linux-raid\@vger.kernel.org

The brute force way to find the file is find all files and cat to
/dev/null checking for a bad return code on the cat command.

Last time I did it, that was easier, and unless the filesystem is
really really big should finish in a day or 2.    debugfs was not easy
to understand and/or work with, and overall the brute force method
took less of my time to implement.   if find/cat does not find it that
would indicate the error is in the free space or the filesystem data.

On Tue, Jan 2, 2018 at 3:27 PM, NeilBrown <neilb@suse.com> wrote:
> On Tue, Jan 02 2018, RQM wrote:
>
>> Hello,
>>
>> thanks for the quick and helpful responses! Answers inline:
>>
>> > Step one is confirm that it is easy to reproduce.
>>> Does
>>> dd if=/dev/md0 bs=4K skip=1598030208 count=1 of=/dev/null
>>>
>>> trigger the message reliably?
>>> To check that "4K" is the correct blocksize, run
>>> blockdev --getbsz /dev/md0
>>>
>>> use whatever number if gives as 'bs='.
>>
>>
>> blockdev does indeed report a blocksize of 4096, and the dd line does reliably trigger
>> dd: error reading '/dev/md0': Input/output error
>> and the same line in dmesg as before.
>>
>>> Once you can reproduce with minimal IO, do
>>> echo file:raid5.c +p > /sys/kernel/debug/dynamic_debug/control
>>>repeat experiment
>>>
>>>echo file:raid5.c -p > /sys/kernel/debug/dynamic_debug/control
>>>
>>> and report the messages that appear in 'dmesg'.
>>
>> I had to replace the colon with a space in those two lines (otherwise I would get "bash: echo: write error: Invalid argument"), but after that, this is what I got in dmesg:
>> https://paste.ubuntu.com/26305369/
>
> [Tue Jan  2 11:14:47 2018] locked=0 uptodate=0 to_read=1 to_write=0 failed=2 failed_num=3,2
>
> So for this stripe. Two devices appear to be failed: 3 and 2.
> As the two devices clearly are thought to be working there must be a bad
> block recorded.
>
>>
>>> Also report "mdadm -E" of each member device, and kernel version (though
>>> I see that is in the serverfault report :  4.9.30-2+deb9u5).
>>
>> mdadm -E says: https://paste.ubuntu.com/26305379/
>
> I needed "mdadm -E" the components of the array, so the partitions
> rather than the whole devices. e.g. /dev/sdb1, not /dev/sdb.
>
> This will show a non-empty bad block list on at least two devices.
>
> You can remove the bad block by over-writing it.
>   dd if=/dev/zero of=/dev/md0 bs=4K seek=1598030208 count=1
> though that might corrupt some file containing the block.
>
> (note "seek" seeks in the output file, "skip" skips over the input
> file).
>
> How did the bad block get there?
> A possible scenario is:
>  - A device fails and is removed from array
>  - read error occurs on another device.  Rather than failing the whole
>    device, md records that block as bad.
>  - failed device is replaced (or found to be a cabling problem) and
>    recovered.  Due to the bad block the stripe cannot be recovered,
>    so a bad block is recorded in the new device.
>
> If the read error was really a cabling problem, then the original data
> might still be there.  If it is, you could recover it and write it back
> to the array rather then writing from /dev/zero.
> Finding out which file the failed block is part of is probably possible,
> but not necessarily easy.  If you want to try, the first step is
> reporting what filesystem is on md0.  If it is ext4, then debugfs can
> help.  If something else - I don't know.
>
> NeilBrown
>
>
>
>> The kernel has been updated between the serverfault post and my first mail to this list to 4.9.65-3+deb9u1. No changes since.
>>
>>>
>>> Then run
>>> blktrace /dev/md0 /dev/sd[acdef]
>>> in one window while reproducing the error again in another window.
>>> Then interrupt the blktrace.  This will produce several blocktrace*
>>> files.  create a tar.gz of these and put them somewhere that I can get
>>> them - hopefully they won't be too big.
>>
>> I had to adjust the last blktrace argument to /dev/sd[b-f] since after the last reboot the names of the drives have changed, but here's the output:
>> https://filebin.ca/3mnjUz1OIXqm/blktrace-out.tar.gz
>> I also included the blktrace terminal output in there.
>>
>> Thank you so much for the effort! Please let me know if you need anything.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-02 21:27     ` NeilBrown
  2018-01-02 22:30       ` Roger Heflin
@ 2018-01-04 14:45       ` RQM
  2018-01-05  1:05         ` NeilBrown
  1 sibling, 1 reply; 12+ messages in thread
From: RQM @ 2018-01-04 14:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid\\\@vger.kernel.org

Hello,

> I needed "mdadm -E" the components of the array, so the partitions
> rather than the whole devices. e.g. /dev/sdb1, not /dev/sdb.

Sorry, that should have occurred to me. Here's the output:
https://paste.ubuntu.com/26319689/

Indeed, bad blocks are present on two devices.

> You can remove the bad block by over-writing it.
> dd if=/dev/zero of=/dev/md0 bs=4K seek=1598030208 count=1
> though that might corrupt some file containing the block.

I have tried that just now, but before running mdadm -E above. dd appears to succeed when writing to the bad block, but after that, reading that block with dd fails again:
"dd: error reading '/dev/md0': Input/output error"

In dmesg, the following errors appear:
[220444.068715] VFS: Dirty inode writeback failed for block device md0 (err=-5).
[220445.850229] Buffer I/O error on dev md0, logical block 1598030208, async page read

I have repeated the dd write-then-read experiment, with identical results.

The filesystem is indeed ext4, but it's not of tremendous importance to me that all data is recovered, as the array contains backup data only. However, I would like to get the backup system back into operation, so I'd be very grateful for further hints how to get the array into a usable state.

Thank you so much for your help so far!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-04 14:45       ` RQM
@ 2018-01-05  1:05         ` NeilBrown
  2018-01-05 12:55           ` RQM
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2018-01-05  1:05 UTC (permalink / raw)
  To: RQM; +Cc: linux-raid\\\@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]

On Thu, Jan 04 2018, RQM wrote:

> Hello,
>
>> I needed "mdadm -E" the components of the array, so the partitions
>> rather than the whole devices. e.g. /dev/sdb1, not /dev/sdb.
>
> Sorry, that should have occurred to me. Here's the output:
> https://paste.ubuntu.com/26319689/
>
> Indeed, bad blocks are present on two devices.
>
>> You can remove the bad block by over-writing it.
>> dd if=/dev/zero of=/dev/md0 bs=4K seek=1598030208 count=1
>> though that might corrupt some file containing the block.
>
> I have tried that just now, but before running mdadm -E above. dd appears to succeed when writing to the bad block, but after that, reading that block with dd fails again:
> "dd: error reading '/dev/md0': Input/output error"
>
> In dmesg, the following errors appear:
> [220444.068715] VFS: Dirty inode writeback failed for block device md0 (err=-5).
> [220445.850229] Buffer I/O error on dev md0, logical block 1598030208, async page read
>
> I have repeated the dd write-then-read experiment, with identical results.
>
> The filesystem is indeed ext4, but it's not of tremendous importance to me that all data is recovered, as the array contains backup data only. However, I would like to get the backup system back into operation, so I'd be very grateful for further hints how to get the array into a usable state.

The easiest approach is to remove the bad block log.
Stop array, and then assemble with --update=no-bbl.
e.g
  mdadm -S /dev/md0
  mdadm -A /dev/md0 --update=no-bbl /dev/sd[bcdef]3

Before you do that though, please take a dump of the metadata and send
it to me, in case I get motivated to figure out why writing didn't work.

 mkdir /tmp/dump
 mdadm --dump /tmp/dump /dev/sd[bcdef]3
 tar czSf /tmp/dump.tgz /tmp/dump

The files in /tmp are sparse images of the hard drives with only
the metadata present.  The 'S' flag to tar should cause it to notice
this and create a tiny tgz file.
Then send me /tmp/dump.tgz.

Thanks,
NeilBrown


>
> Thank you so much for your help so far!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-05  1:05         ` NeilBrown
@ 2018-01-05 12:55           ` RQM
  2018-01-13 12:18             ` RQM
  0 siblings, 1 reply; 12+ messages in thread
From: RQM @ 2018-01-05 12:55 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid\\\\\\\@vger.kernel.org

Hi,

here's the metadata dump:
https://filebin.ca/3n9OgaeSlV6x/dump.tgz

When I try assembling with no-bbl, this is what I get:

# mdadm -A /dev/md0 --update=no-bbl /dev/sd[bcdef]3
mdadm: Cannot remove active bbl from /dev/sdc3
mdadm: Cannot remove active bbl from /dev/sde3
mdadm: /dev/md0 has been started with 5 drives.

The array does start up, but the behavior regarding dd reads and writes remains as it was before:
Failure to read with the corresponding error messages in dmesg and on stdout/stderr,
failure to write, but only indicated in dmesg.

By the way, I have run smart long tests a day or two ago, and it reportedly completed without errors on all involved disks.

Thank you again so much for your help!


>-------- Original Message --------
>Subject: Re: Troubleshooting "Buffer I/O error" on reading md device
>Local Time: January 5, 2018 2:05 AM
>UTC Time: January 5, 2018 1:05 AM
>From: neilb@suse.com
>To: RQM <rqm@protonmail.com>
>linux-raid\\\\\\\@vger.kernel.org <linux-raid@vger.kernel.org>
>
>On Thu, Jan 04 2018, RQM wrote:
>
>>Hello,
>>>I needed "mdadm -E" the components of the array, so the partitions
>>> rather than the whole devices. e.g. /dev/sdb1, not /dev/sdb.
>>>Sorry, that should have occurred to me. Here's the output:
>>https://paste.ubuntu.com/26319689/
>>Indeed, bad blocks are present on two devices.
>>>You can remove the bad block by over-writing it.
>>> dd if=/dev/zero of=/dev/md0 bs=4K seek=1598030208 count=1
>>> though that might corrupt some file containing the block.
>>>I have tried that just now, but before running mdadm -E above. dd appears to succeed when writing to the bad block, but after that, reading that block with dd fails again:
>> "dd: error reading '/dev/md0': Input/output error"
>>In dmesg, the following errors appear:
>> [220444.068715] VFS: Dirty inode writeback failed for block device md0 (err=-5).
>> [220445.850229] Buffer I/O error on dev md0, logical block 1598030208, async page read
>>I have repeated the dd write-then-read experiment, with identical results.
>>The filesystem is indeed ext4, but it's not of tremendous importance to me that all data is recovered, as the array contains backup data only. However, I would like to get the backup system back into operation, so I'd be very grateful for further hints how to get the array into a usable state.
>>
> The easiest approach is to remove the bad block log.
> Stop array, and then assemble with --update=no-bbl.
> e.g
> mdadm -S /dev/md0
> mdadm -A /dev/md0 --update=no-bbl /dev/sd[bcdef]3
>
> Before you do that though, please take a dump of the metadata and send
> it to me, in case I get motivated to figure out why writing didn't work.
>
> mkdir /tmp/dump
> mdadm --dump /tmp/dump /dev/sd[bcdef]3
> tar czSf /tmp/dump.tgz /tmp/dump
>
> The files in /tmp are sparse images of the hard drives with only
> the metadata present.  The 'S' flag to tar should cause it to notice
> this and create a tiny tgz file.
> Then send me /tmp/dump.tgz.
>
> Thanks,
> NeilBrown
>
>
>>Thank you so much for your help so far!
>>
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-05 12:55           ` RQM
@ 2018-01-13 12:18             ` RQM
  2018-02-02  1:55               ` NeilBrown
  2022-11-01 23:49               ` Darshaka Pathirana
  0 siblings, 2 replies; 12+ messages in thread
From: RQM @ 2018-01-13 12:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid\\\\\\\@vger.kernel.org

Hello,

I have been made aware that the link I had supplied previously does not work anymore.
Here's another attempt at uploading the `mdadm --dump /dev/sd[bcdef]3` output:

https://filebin.net/i0olmgzg52obnp0f/dump.tgz

Any help is greatly appreciated. Please do let me know whether you plan on working on this issue in the near future, because otherwise I will have to re-create a new array on these disks in order to put them into production again.

Thank you so much!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-13 12:18             ` RQM
@ 2018-02-02  1:55               ` NeilBrown
  2022-11-01 23:49               ` Darshaka Pathirana
  1 sibling, 0 replies; 12+ messages in thread
From: NeilBrown @ 2018-02-02  1:55 UTC (permalink / raw)
  To: RQM; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

On Sat, Jan 13 2018, RQM wrote:

> Hello,
>
> I have been made aware that the link I had supplied previously does not work anymore.
> Here's another attempt at uploading the `mdadm --dump /dev/sd[bcdef]3` output:
>
> https://filebin.net/i0olmgzg52obnp0f/dump.tgz
> 
> Any help is greatly appreciated. Please do let me know whether you plan on working on this issue in the near future, because otherwise I will have to re-create a new array on these disks in order to put them into production again.
>
> Thank you so much!

Sorry that is has taken me so long to get to this - January was a bit
crazy.

Short answer is that if you use
  --assemble --force-no-bbl
it will really truly get rid of the bad block log.  I really should add
that to the man page.

Longer answer:
If you assemble the array (without force-no-bbl) and

  grep . /sys/block/md0/md/rd*/bad_blocks

you'll get

 /sys/block/md0/md/rd2/bad_blocks:3196060416 8
 /sys/block/md0/md/rd3/bad_blocks:3196060416 8

So that is a 4K block that is bad at the same location on 2 devices.
There is no data offset, and the chunk size is 64K, so using bc:

% bc
3196060416/(64*2)
24969222
3196060416%(64*2)
0

the blocks are at the start of stripe 24969222.
Each stripe is 4 date chunks, and a chunk is 64K or 16 4K blocks.
So the block offset is close to

% bc
24969222*4*16
1598030208

which is exactly the "logical block" which was reported.

There are 5 devices, so the parity block rotates through the pattern

D0 D1 D2 D3 P
D1 D2 D3 P  D0
D2 D3 P  D0 D1
D3 P  D0 D1 D2
P  D0 D1 D2 D3

% bc
24969222%5
2

So this should be row 2 (counting from 0)
D2 D3 P  D0 D1

rd2 and rd2 are bad, so that is 'P' and 'D0'.

So this confirms that it is just the first 4K block of that stripe which
is bad.
Writing should fix it... but it doesn't.  The write gets an IO error.

Looking at the code I can see why.  The fix isn't completely
trivial. I'll have think about it carefully.

But for now --update=force-no-bbl should get you going.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Troubleshooting "Buffer I/O error" on reading md device
  2018-01-13 12:18             ` RQM
  2018-02-02  1:55               ` NeilBrown
@ 2022-11-01 23:49               ` Darshaka Pathirana
  1 sibling, 0 replies; 12+ messages in thread
From: Darshaka Pathirana @ 2022-11-01 23:49 UTC (permalink / raw)
  To: linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --]

Hi,

I am capturing this thread, because I also stumbled over the same problem,
except I am running a RAID-1 setup.

The server is (still) running Debian/stretch with mdadm 3.4-4+b1.

Basically this is what happens:

Accessing the RAID fails:

  % sudo dd if=/dev/md0 of=/dev/null skip=3112437760 count=33554432
  dd: error reading '/dev/md0': Input/output error
  514936+0 records in
  514936+0 records out
  263647232 bytes (264 MB, 251 MiB) copied, 0.447983 s, 589 MB/s

dmesg output while trying to access the RAID:

  [Tue Nov  1 22:09:59 2022] Buffer I/O error on dev md0, logical block 389119087, async page read
  [Tue Nov  1 22:22:01 2022] Buffer I/O error on dev md0, logical block 389119087, async page read

Jumping to the 'logical block':

  % sudo blockdev --getbsz /dev/md0
  4096

  % sudo dd if=/dev/md0 of=/dev/null skip=389119087 bs=4096 count=33554432
  dd: error reading '/dev/md0': Input/output error
  0+0 records in
  0+0 records out
  0 bytes copied, 0.000129958 s, 0.0 kB/s

But the underlying disk seemed ok, which was strange:

  % sudo dd if=/dev/sdb1 skip=3112437760 count=33554432 of=/dev/null
  33554432+0 records in
  33554432+0 records out
  17179869184 bytes (17 GB, 16 GiB) copied, 112.802 s, 152 MB/s
  sudo dd if=/dev/sdb1 skip=3112437760 count=33554432 of=/dev/null  9.18s user 29.80s system 34% cpu 1:52.81 total

Note, through trial + error I found the offset of /dev/md0 to
/dev/sdb1 to be 262144 blocks (with block size 512). That's why skip is
not the same for both commands.

After a very long research I found this thread and yes, there is a bad
block log:

  % cat /sys/block/md0/md/rd*/bad_blocks
  3113214840 8

  % sudo mdadm -E /dev/sdb1 | grep Bad
    Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

The other disk of that RAID has been removed, because the disk had
SMART errors and is about to be replaced. Only then I noticed the
input/output error.

I am not sure how to proceed from here. Do you have any advice?

On 2018-02-02 02:55, NeilBrown wrote:
>
> Short answer is that if you use
>   --assemble --force-no-bbl
> it will really truly get rid of the bad block log.  I really should add
> that to the man page.

*friendly wave*

> Longer answer:
> If you assemble the array (without force-no-bbl) and
>
> [...]
>
> So this should be row 2 (counting from 0)
> D2 D3 P  D0 D1
>
> rd2 and rd2 are bad, so that is 'P' and 'D0'.
>
> So this confirms that it is just the first 4K block of that stripe which
> is bad.
> Writing should fix it... but it doesn't.  The write gets an IO error.
>
> Looking at the code I can see why.  The fix isn't completely
> trivial. I'll have think about it carefully.

I am curious: did you come up with a solution?

Best & thx for your help,
 - Darsha

P.s. I am not subscribed, please put me on CC.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-11-01 23:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-02  2:46 Troubleshooting "Buffer I/O error" on reading md device RQM
2018-01-02  3:13 ` Reindl Harald
2018-01-02  4:28 ` NeilBrown
2018-01-02 10:40   ` RQM
2018-01-02 21:27     ` NeilBrown
2018-01-02 22:30       ` Roger Heflin
2018-01-04 14:45       ` RQM
2018-01-05  1:05         ` NeilBrown
2018-01-05 12:55           ` RQM
2018-01-13 12:18             ` RQM
2018-02-02  1:55               ` NeilBrown
2022-11-01 23:49               ` Darshaka Pathirana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).