From: NeilBrown <neilb@suse.com>
To: matt@digitallyhosted.com, linux-raid@vger.kernel.org
Subject: Re: badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives
Date: Mon, 21 Dec 2015 15:00:16 +1100 [thread overview]
Message-ID: <87si2w4fjj.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <3fafa3e9267b4ba0b5f9d61d3a416cf5@digitallyhosted.com>
[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]
On Thu, Nov 12 2015, matt@digitallyhosted.com wrote:
> Hello,
>
> I posted a while back about getting buffer i/o errors in my dmesg logs
> to my raid array, something along the lines of this:
>
> [158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 125274714 (offset 176160768 size 8388608
> starting block 4955235712)
> [158219.456487] Buffer I/O error on device md4, logical block 4955235584
> [158219.456490] Buffer I/O error on device md4, logical block 4955235585
> [158219.456491] Buffer I/O error on device md4, logical block 4955235586
> [158219.456491] Buffer I/O error on device md4, logical block 4955235587
> [158219.456492] Buffer I/O error on device md4, logical block 4955235588
> [158219.456493] Buffer I/O error on device md4, logical block 4955235589
> [158219.456494] Buffer I/O error on device md4, logical block 4955235590
> [158219.456495] Buffer I/O error on device md4, logical block 4955235591
> [158219.456496] Buffer I/O error on device md4, logical block 4955235592
> [158219.456497] Buffer I/O error on device md4, logical block 4955235593
> [158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 125274714 (offset 176160768 size 8388608
> starting block 4955235456)
> [158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 125274714 (offset 176160768 size 8388608
> starting block 4955235200)
> [158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 125274714 (offset 176160768 size 8388608
> starting block 4955234944)
> [158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 125274714 (offset 176160768 size 8388608
> starting block 4955234688)
> [158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 125274714 (offset 176160768 size 8388608
> starting block 4955234432)
> [158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O
> error -5 writing to inode 123995503 (offset 0 size 8388608 starting
> block 4970080384)
> [158469.158281] buffer_io_error: 1526 callbacks suppressed
>
> I am now using the latest mainline kernel, 4.3.0 and I believe something
> is going wrong with the badblocks implementation.
>
> I originally had 3 drives, all with the same badblocks list. This array
> has been running a while so I have no idea how these 3 discs all ended
> up with the same list of badblocks.
>
> Now, if I remove any drive, which has no badblock entries, and re-add
> it. Once the sync is complete I end up with another drive with the same
> badblocks list.
An entry in the bad-blocks list means that the data at that location is
not available, possibly because the block is bad.
If you have a degraded RAID6 where any appears in 2 or more bad-blocks
lists, then it is not possible to recover the data at that address when
a spare is recovered. So the same address will be added to the bad
block log on the spare.
You could remove he bad block from all the device by writing to all of
the affected blocks at once, but that is admittedly a little difficult
to manage.
I probably need to make it possible to clear the bad block log by a
successful write to just a single data block (and the matching parity
blocks). I've added that to by to-do list.
I've just push out a modification to mdadm so you can run
mdadm --assemble --update=force-no-bbl /dev/md/whatver list of devices
and it will remove the bad-block lists even though they are not empty.
So if you
git clone git://neil.brown.name/mdadm
cd mdadm
make
./mdadm --stop /dev/md4
./mdadm --assemble /dev/md4 --update=force-no-bblk list-of-devices
it should get rid of your problem.
However, as your mail is 6 weeks old (I was on leave...) maybe you have
already found another solution.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
prev parent reply other threads:[~2015-12-21 4:00 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-12 11:34 badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives matt
2015-12-21 4:00 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87si2w4fjj.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=matt@digitallyhosted.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).