linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Darshaka Pathirana <dpat@syn-net.org>
To: linux-raid@vger.kernel.org
Subject: Re: Troubleshooting "Buffer I/O error" on reading md device
Date: Wed, 2 Nov 2022 00:49:32 +0100	[thread overview]
Message-ID: <46045b2b-2aa5-5e75-2616-d28bbcb66786@syn-net.org> (raw)
In-Reply-To: <F_VxJAwrrEFTHG3fvMDQYPLKrS0w9yabmLk-nyrGRf3UQV-QxsnRSzojv6XQtCiKN7YMZ3sSOlbLduUL_apbYN25g7ouW-Th2S_fbMDgAEM=@protonmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --]

Hi,

I am capturing this thread, because I also stumbled over the same problem,
except I am running a RAID-1 setup.

The server is (still) running Debian/stretch with mdadm 3.4-4+b1.

Basically this is what happens:

Accessing the RAID fails:

  % sudo dd if=/dev/md0 of=/dev/null skip=3112437760 count=33554432
  dd: error reading '/dev/md0': Input/output error
  514936+0 records in
  514936+0 records out
  263647232 bytes (264 MB, 251 MiB) copied, 0.447983 s, 589 MB/s

dmesg output while trying to access the RAID:

  [Tue Nov  1 22:09:59 2022] Buffer I/O error on dev md0, logical block 389119087, async page read
  [Tue Nov  1 22:22:01 2022] Buffer I/O error on dev md0, logical block 389119087, async page read

Jumping to the 'logical block':

  % sudo blockdev --getbsz /dev/md0
  4096

  % sudo dd if=/dev/md0 of=/dev/null skip=389119087 bs=4096 count=33554432
  dd: error reading '/dev/md0': Input/output error
  0+0 records in
  0+0 records out
  0 bytes copied, 0.000129958 s, 0.0 kB/s

But the underlying disk seemed ok, which was strange:

  % sudo dd if=/dev/sdb1 skip=3112437760 count=33554432 of=/dev/null
  33554432+0 records in
  33554432+0 records out
  17179869184 bytes (17 GB, 16 GiB) copied, 112.802 s, 152 MB/s
  sudo dd if=/dev/sdb1 skip=3112437760 count=33554432 of=/dev/null  9.18s user 29.80s system 34% cpu 1:52.81 total

Note, through trial + error I found the offset of /dev/md0 to
/dev/sdb1 to be 262144 blocks (with block size 512). That's why skip is
not the same for both commands.

After a very long research I found this thread and yes, there is a bad
block log:

  % cat /sys/block/md0/md/rd*/bad_blocks
  3113214840 8

  % sudo mdadm -E /dev/sdb1 | grep Bad
    Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

The other disk of that RAID has been removed, because the disk had
SMART errors and is about to be replaced. Only then I noticed the
input/output error.

I am not sure how to proceed from here. Do you have any advice?

On 2018-02-02 02:55, NeilBrown wrote:
>
> Short answer is that if you use
>   --assemble --force-no-bbl
> it will really truly get rid of the bad block log.  I really should add
> that to the man page.

*friendly wave*

> Longer answer:
> If you assemble the array (without force-no-bbl) and
>
> [...]
>
> So this should be row 2 (counting from 0)
> D2 D3 P  D0 D1
>
> rd2 and rd2 are bad, so that is 'P' and 'D0'.
>
> So this confirms that it is just the first 4K block of that stripe which
> is bad.
> Writing should fix it... but it doesn't.  The write gets an IO error.
>
> Looking at the code I can see why.  The fix isn't completely
> trivial. I'll have think about it carefully.

I am curious: did you come up with a solution?

Best & thx for your help,
 - Darsha

P.s. I am not subscribed, please put me on CC.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

      parent reply	other threads:[~2022-11-01 23:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-02  2:46 Troubleshooting "Buffer I/O error" on reading md device RQM
2018-01-02  3:13 ` Reindl Harald
2018-01-02  4:28 ` NeilBrown
2018-01-02 10:40   ` RQM
2018-01-02 21:27     ` NeilBrown
2018-01-02 22:30       ` Roger Heflin
2018-01-04 14:45       ` RQM
2018-01-05  1:05         ` NeilBrown
2018-01-05 12:55           ` RQM
2018-01-13 12:18             ` RQM
2018-02-02  1:55               ` NeilBrown
2022-11-01 23:49               ` Darshaka Pathirana [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46045b2b-2aa5-5e75-2616-d28bbcb66786@syn-net.org \
    --to=dpat@syn-net.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).