From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Troubleshooting "Buffer I/O error" on reading md device Date: Fri, 02 Feb 2018 12:55:53 +1100 Message-ID: <87inbg2lzq.fsf@notabene.neil.brown.name> References: <1z_MZ4Xqld_IRMUbGJE66v2VUhXkBhlHnWJEfLASWNcv5s3Wo3A1YeuQBJBuksxJtFPpmsPbg1_F8PC3Sj4HrzL6Go3aIanVihzcC-4ZHEQ=@protonmail.com> <87373og9z9.fsf@notabene.neil.brown.name> <87r2r8dk80.fsf@notabene.neil.brown.name> <871sj5dsiv.fsf@notabene.neil.brown.name> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: RQM Cc: "linux-raid@vger.kernel.org" List-Id: linux-raid.ids --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sat, Jan 13 2018, RQM wrote: > Hello, > > I have been made aware that the link I had supplied previously does not w= ork anymore. > Here's another attempt at uploading the `mdadm --dump /dev/sd[bcdef]3` ou= tput: > > https://filebin.net/i0olmgzg52obnp0f/dump.tgz > =E2=80=8B > Any help is greatly appreciated. Please do let me know whether you plan o= n working on this issue in the near future, because otherwise I will have t= o re-create a new array on these disks in order to put them into production= again. > > Thank you so much! Sorry that is has taken me so long to get to this - January was a bit crazy. Short answer is that if you use --assemble --force-no-bbl it will really truly get rid of the bad block log. I really should add that to the man page. Longer answer: If you assemble the array (without force-no-bbl) and grep . /sys/block/md0/md/rd*/bad_blocks you'll get /sys/block/md0/md/rd2/bad_blocks:3196060416 8 /sys/block/md0/md/rd3/bad_blocks:3196060416 8 So that is a 4K block that is bad at the same location on 2 devices. There is no data offset, and the chunk size is 64K, so using bc: % bc 3196060416/(64*2) 24969222 3196060416%(64*2) 0 the blocks are at the start of stripe 24969222. Each stripe is 4 date chunks, and a chunk is 64K or 16 4K blocks. So the block offset is close to % bc 24969222*4*16 1598030208 which is exactly the "logical block" which was reported. There are 5 devices, so the parity block rotates through the pattern D0 D1 D2 D3 P D1 D2 D3 P D0 D2 D3 P D0 D1 D3 P D0 D1 D2 P D0 D1 D2 D3 % bc 24969222%5 2 So this should be row 2 (counting from 0) D2 D3 P D0 D1 rd2 and rd2 are bad, so that is 'P' and 'D0'. So this confirms that it is just the first 4K block of that stripe which is bad. Writing should fix it... but it doesn't. The write gets an IO error. Looking at the code I can see why. The fix isn't completely trivial. I'll have think about it carefully. But for now --update=3Dforce-no-bbl should get you going. NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAlpzxSoACgkQOeye3VZi gblYWhAAn32lYiQxNFIgDPXGvNpcy6nlf/YKb/ev4gctqG1oTwoz/OtjmsP9h6p1 hU0oE27n//QDgjQubKVp9N7DX0VjtPUQpVB48XEgb7kKNHlM9u1zF7mna5AD+DxJ xtOEcvJdMxc3R+B2Q0tSonTJtoDF8L3CvCHSesVva5urhse+X2EUtfB507Ka3ng7 944cXw8GYVT12cp+UvpTqT7YlxnP5mbJmtPFYfGa042tiRVK4oyq17tBbEVh4LaT w09Vxpqq+aOZBeLDTTNOIA8bFVRCGIZaVPBkbaKBKzS0lK0/lqMx1E2zJjP028A+ qIbmKBfDwNvR9695acWQq9bHMOzOzcfr6S4RjAV46FYPmE+ML7NzRWKGUGBXqHYl tMpn4n+wlTDX/ow3m0R7juSL3LED1tyDjh/I5bvjH5gI6/PXxffNL1/h3Ma26nYk MazNDAOiC1m9VXvdjtP5yRI/BogVWyVKJGS0c8FqSr5v5C4r/5vzGlVLMt1VuceX HkqgaLcSl7NRrxKsd4GSp37JymCaR3ZUInVKYSpIT2S5R0IimF8DZpKjnkANuo69 G5+v2GGMhGmExAZtPBKC1+aSJoDwblyPxJGvlDSYzx+/RYpwSxkNXRbNAhvrXAAz hMLbcAUPAWwMaq2OW/BHlnhfGPP5dXsoQPcGP2WqCmVdqEpvD9A= =XM2G -----END PGP SIGNATURE----- --=-=-=--