badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives
@ 2015-11-12 11:34 matt
  2015-12-21  4:00 ` NeilBrown
  0 siblings, 1 reply; 2+ messages in thread
From: matt @ 2015-11-12 11:34 UTC (permalink / raw)
  To: linux-raid

Hello,

I posted a while back about getting buffer i/o errors in my dmesg logs 
to my raid array, something along the lines of this:

[158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 125274714 (offset 176160768 size 8388608 
starting block 4955235712)
[158219.456487] Buffer I/O error on device md4, logical block 4955235584
[158219.456490] Buffer I/O error on device md4, logical block 4955235585
[158219.456491] Buffer I/O error on device md4, logical block 4955235586
[158219.456491] Buffer I/O error on device md4, logical block 4955235587
[158219.456492] Buffer I/O error on device md4, logical block 4955235588
[158219.456493] Buffer I/O error on device md4, logical block 4955235589
[158219.456494] Buffer I/O error on device md4, logical block 4955235590
[158219.456495] Buffer I/O error on device md4, logical block 4955235591
[158219.456496] Buffer I/O error on device md4, logical block 4955235592
[158219.456497] Buffer I/O error on device md4, logical block 4955235593
[158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 125274714 (offset 176160768 size 8388608 
starting block 4955235456)
[158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 125274714 (offset 176160768 size 8388608 
starting block 4955235200)
[158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 125274714 (offset 176160768 size 8388608 
starting block 4955234944)
[158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 125274714 (offset 176160768 size 8388608 
starting block 4955234688)
[158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 125274714 (offset 176160768 size 8388608 
starting block 4955234432)
[158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
error -5 writing to inode 123995503 (offset 0 size 8388608 starting 
block 4970080384)
[158469.158281] buffer_io_error: 1526 callbacks suppressed

I am now using the latest mainline kernel, 4.3.0 and I believe something 
is going wrong with the badblocks implementation.

I originally had 3 drives, all with the same badblocks list.  This array 
has been running a while so I have no idea how these 3 discs all ended 
up with the same list of badblocks.

Now, if I remove any drive, which has no badblock entries, and re-add 
it.  Once the sync is complete I end up with another drive with the same 
badblocks list.

At the moment 5 of the drives in the array all have the following 
entries (exactly the same):

Bad-blocks on /dev/sdi1:
           1938038928 for 512 sectors
           1938039440 for 512 sectors
           1938977144 for 512 sectors
           1938977656 for 512 sectors
           3303750816 for 512 sectors
           3303751328 for 512 sectors
           3313648904 for 512 sectors
           3313649416 for 512 sectors
           3313651976 for 512 sectors
           3313652488 for 512 sectors
           3418023432 for 512 sectors
           3418023944 for 512 sectors
           3418024456 for 512 sectors
           3418024968 for 512 sectors
           3418037768 for 512 sectors
           3418038280 for 512 sectors
           3418038792 for 512 sectors
           3418039304 for 512 sectors
           3418112520 for 512 sectors
           3418113032 for 512 sectors
           3418113544 for 512 sectors
           3418114056 for 512 sectors
           3418114568 for 512 sectors
           3418115080 for 512 sectors
           3418124808 for 512 sectors
           3418125320 for 512 sectors
           3418165768 for 512 sectors
           3418166280 for 512 sectors
           3418187272 for 512 sectors
           3418187784 for 512 sectors
           3418213224 for 512 sectors
           3418213736 for 512 sectors
           3418214248 for 512 sectors
           3418214760 for 512 sectors
           3418215272 for 512 sectors
           3418215784 for 512 sectors
           3420607528 for 512 sectors
           3420608040 for 512 sectors
           3420626984 for 512 sectors
           3420627496 for 512 sectors
           3448897824 for 512 sectors
           3448898336 for 512 sectors
           3458897888 for 512 sectors
           3458898400 for 512 sectors
           3519403992 for 512 sectors
           3519404504 for 512 sectors
           3617207456 for 512 sectors
           3617207968 for 512 sectors


How can I clear the badblocks list on all the drives? Something seems 
very wrong and I believe I only actually have 1 faulty disc (I have run 
smartctl long tests on all drives, only 1 failed).

If I can't clear them, how can I get ext4 to recognise the badblocks 
within the array so that it no longer attempts to write to those blocks?

Do the blocks in the list above map to blocks on a the physical 
harddrive, or to blocks on the md device - IE: If that block list was 
passed to ext4 filesystem as bad sectors, would that be the correct 
location on the array or are those the badblocks on one of the 
harddrives in the array.

Thanks

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives
  2015-11-12 11:34 badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives matt
@ 2015-12-21  4:00 ` NeilBrown
  0 siblings, 0 replies; 2+ messages in thread
From: NeilBrown @ 2015-12-21  4:00 UTC (permalink / raw)
  To: matt, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]

On Thu, Nov 12 2015, matt@digitallyhosted.com wrote:

> Hello,
>
> I posted a while back about getting buffer i/o errors in my dmesg logs 
> to my raid array, something along the lines of this:
>
> [158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955235712)
> [158219.456487] Buffer I/O error on device md4, logical block 4955235584
> [158219.456490] Buffer I/O error on device md4, logical block 4955235585
> [158219.456491] Buffer I/O error on device md4, logical block 4955235586
> [158219.456491] Buffer I/O error on device md4, logical block 4955235587
> [158219.456492] Buffer I/O error on device md4, logical block 4955235588
> [158219.456493] Buffer I/O error on device md4, logical block 4955235589
> [158219.456494] Buffer I/O error on device md4, logical block 4955235590
> [158219.456495] Buffer I/O error on device md4, logical block 4955235591
> [158219.456496] Buffer I/O error on device md4, logical block 4955235592
> [158219.456497] Buffer I/O error on device md4, logical block 4955235593
> [158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955235456)
> [158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955235200)
> [158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955234944)
> [158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955234688)
> [158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955234432)
> [158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 123995503 (offset 0 size 8388608 starting 
> block 4970080384)
> [158469.158281] buffer_io_error: 1526 callbacks suppressed
>
> I am now using the latest mainline kernel, 4.3.0 and I believe something 
> is going wrong with the badblocks implementation.
>
> I originally had 3 drives, all with the same badblocks list.  This array 
> has been running a while so I have no idea how these 3 discs all ended 
> up with the same list of badblocks.
>
> Now, if I remove any drive, which has no badblock entries, and re-add 
> it.  Once the sync is complete I end up with another drive with the same 
> badblocks list.

An entry in the bad-blocks list means that the data at that location is
not available, possibly because the block is bad.

If you have a degraded RAID6 where any appears in 2 or more bad-blocks
lists, then it is not possible to recover the data at that address when
a spare is recovered.  So the same address will be added to the bad
block log on the spare.

You could remove he bad block from all the device by writing to all of
the affected blocks at once, but that is admittedly a little difficult
to manage.

I probably need to make it possible to clear the bad block log by a
successful write to just a single data block (and the matching parity
blocks).  I've added that to by to-do list.

I've just push out a modification to mdadm so you can run
  mdadm --assemble --update=force-no-bbl /dev/md/whatver list of devices

and it will remove the bad-block lists even though they are not empty.
So if you

 git clone git://neil.brown.name/mdadm
 cd mdadm
 make
 ./mdadm --stop /dev/md4
 ./mdadm --assemble /dev/md4 --update=force-no-bblk list-of-devices

it should get rid of your problem.
However, as your mail is 6 weeks old (I was on leave...) maybe you have
already found another solution.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-12-21  4:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-12 11:34 badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives matt
2015-12-21  4:00 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).