From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Are we forced to use bad blocks list? Date: Thu, 7 Aug 2014 12:27:42 +1000 Message-ID: <20140807122742.571528be@notabene.brown> References: <53DA5340.7080507@shiftmail.org> <20140804113859.63b5ac90@notabene.brown> <53DF7EA7.2070408@shiftmail.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/aiAs_/4fIQWNJUZ2yHjs_Bn"; protocol="application/pgp-signature" Return-path: In-Reply-To: <53DF7EA7.2070408@shiftmail.org> Sender: linux-raid-owner@vger.kernel.org To: Ethan Wilson Cc: linux-raid List-Id: linux-raid.ids --Sig_/aiAs_/4fIQWNJUZ2yHjs_Bn Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 04 Aug 2014 14:37:59 +0200 Ethan Wilson wrote: > On 04/08/2014 03:38, NeilBrown wrote: > > On Thu, 31 Jul 2014 16:31:28 +0200 Ethan Wilson > > wrote: > > > >> Dear MD developers, > >> it seems that with mdadm 3.3.1 , if an array has bad blocks disabled > >> .... > >> array is configured for BBL or not, and add a spare of the same type. > >> > > Why don't you want bad-block-lists? > > > > I'm not necessarily against having some why to avoid getting them > > automatically ... possibly a 'policy' option in mdadm.conf. > > But I'd like to make sure I understand all of your thinking first. > > > > Thanks, > > NeilBrown >=20 > Hello Neil, >=20 > Well... on the ML, I think that we saw the badblocks code triggered only= =20 > once, and it was with the recent thread of Pedro Teixeira. >=20 > It seemed to me that his error condition could indicate that there might= =20 > be a bug in the bad blocks code. It's not clear to me how those zillions= =20 > of bad sectors could have been stored without some bug such as an=20 > erroneous propagation of bad blocks, or erroneous handling or degraded=20 > mode (he said he operated with a doubly degraded raid6 after 3 disks=20 > dropped out). >=20 > Additionally, when he did fsck, that should have cleared the bad blocks=20 > which were being written over, but he said that > "When doing a fsck.ext4 of /dev/md0 it returns the following ( and I can= =20 > do it over and over again with the exact same errors) ..... " > I think 'exact same errors' is not supposed to happen if I understand=20 > the intent of BBL correctly. >=20 > So, I can't be sure, but I have the feeling it's possible that there are= =20 > still a few bugs in the BBL code. MD RAID in general is very stable and=20 > I really like it so much, but maybe on production systems I'd keep the=20 > BBL disabled still for a while, if possible. >=20 Fair enough. Thanks for the explanation. http://git.neil.brown.name/?p=3Dmdadm.git;a=3Dcommitdiff;h=3De2efe9e7bc7330= 7f74a4c2e2197d6d4498dd46f0 will be in mdadm-3.3.2. NeilBrown --Sig_/aiAs_/4fIQWNJUZ2yHjs_Bn Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUBU+LkHjnsnt1WYoG5AQKYsw/+JIzKhIQBg9kQ1tDTxEQYsnSPdswARiI2 QuavSvLBtc+TCU8k4QsyM68hJuYr9U9pBZHFQgfSu7zrMir3C6XTwQRqlwl7ov+p 7B4UUHPygmCtT3ZPMoVRtLRt8cct5CG2Xjwi7+3VwOZIp4fXyVKH+5gSiiY1Wtdc iWKAXbbfKL8dvG2E7jT5s/08hcjSQdF8Gnfe1AbDVYQZM++R2yqQatPHaKoDHSPZ reekBDwn2TNywrgnWeEJA4PAbSjAHm1ZIYDcsbS1AHohEv82K6nHBxW2bWFc29a9 NQB34fZS41A8pGFy54mtlELIFIH29Qc52+/rb/3r/7RlM0XERBqQ0pQ0547AiWDg mvvppXSBz4B0CkkGtrlAM7ffJHlfNxZyFpryFq4L9xFMCQ22z/mv+iA+zvUuPeWZ UorKTmprrQRRcCvpfKMyt0Eq+mn/VwyK8FliWo0sKIGOnkEqsMs4T1P7LU2dzFT+ aUu44ilBkA0pdcbAwSnk9ApeXZAXAJmc3MTif0dwGOdbBZV5t7v4oYdOummRh7zF fTjHKEH1rUoYnZEeTBYNsdXGmn6k4mNSG21KsvaTmdbZ7n/Y1CuG1v+l4n6ESBJH nnMK7Y7f8JFDxWoDC94IqUvEDi/kwQq8fe9qFQ5N8m/j1DajMZCkv4epJcXdz8BF FvEa9ancSTc= =v6Lp -----END PGP SIGNATURE----- --Sig_/aiAs_/4fIQWNJUZ2yHjs_Bn--