From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Problem with patch: "reject a re-add request that cannot be honoured" (commit bedd86b7773fd97f0d708cc0c371c8963ba7ba9a) Date: Wed, 27 Jun 2012 17:22:55 +1000 Message-ID: <20120627172255.40ee3a19@notabene.brown> References: <20120607212434.4d6211d6@notabene.brown> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/ethdYtXCGcnIU9+3go57jKV"; protocol="application/pgp-signature" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Alexander Lyakas Cc: andrey.warkentin@gmail.com, linux-raid List-Id: linux-raid.ids --Sig_/ethdYtXCGcnIU9+3go57jKV Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, 7 Jun 2012 15:47:46 +0300 Alexander Lyakas wrote: > Thanks for commenting, Neil, >=20 > On Thu, Jun 7, 2012 at 2:24 PM, NeilBrown wrote: > > On Thu, 7 Jun 2012 10:22:24 +0300 Alexander Lyakas > > wrote: > > > >> Hi again Neil, and Andrey, > >> > >> looking at this email thread: > >> http://www.spinics.net/lists/raid/msg36236.html > >> between you and Andrey, the conclusion was: > >> "So the correct thing to do is to *not* update the metadata on the > >> recovering device until recovery completes. =A0Then if it fails and is > >> re-added, it will look just the same as when it was re-added the first > >> time, and will do a bitmap-based recovery." > >> > >> I have two doubts about this decision: > >> 1) Since the event count on the recovering drive is not updated, this > >> means that after reboot, when array is re-assembled, this drive will > >> not be added to the array, and the user will have to manually re-add > >> it. I agree this is a minor thing. > > > > Still, if it can be fixed it should be. > > > >> 2) There are places in mdadm, which check for recovery_offset on the > >> drive and take decisions based upon that. Specifically, if there is > >> *no* recovery offset, the data on this drive is considered to be > >> consistent WRT to the failure time of the drive. So, for example, the > >> drive can be a candidate for bumping up events during "force > >> assembly". Now, when superblock on such drive is not updated during > >> recovery (so there is *no* recovery offset), mdadm will think that the > >> drive is consistent, while in fact, its data is totally unusable until > >> after recovery completes. That's because we have updated parts of the > >> drive, but did not complete bringing the whole drive in-sync. > > > > If mdadm would consider updating the event count if not recovery had st= arted, > > then surely it is just as valid to do so once some recovery has started= , even > > if it hasn't completed. >=20 > The patch you accepted from me ("Don't consider disks with a valid > recovery offset as candidates for bumping up event count") actually > attempts to protect from that:) >=20 > I don't understand why "it is just as valid to do so once some > recovery has started". My understanding is that once recovery of a > drive has started, its data is not consistent between different parts > of the drive, until the recovery completes. This is because md does > bitmap-based recovery, and not kind of journal/transaction-log based > recovery. >=20 > However, one could argue that for force-assembly case, when data > anyways can come up as partially corrupted, this is less important. Exactly. And mdadm only updates event counts in the force-assembly case so while it might not be ideal, it is the best we can do. >=20 > I would still think that there is value in recoding in a superblock > that a drive is recovering. Probably. It is a bit unfortunate that if you stop an array that is recovering after a --re-add, you cannot simply 'assemble' it again and get it back to the same state. I'll think more on that. Meanwhile, this patch might address your other problem. It allows --re-add to work if a non-bitmap rebuild fails and is then re-added. diff --git a/drivers/md/md.c b/drivers/md/md.c index c601c4b..d31852e 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5784,7 +5784,7 @@ static int add_new_disk(struct mddev * mddev, mdu_dis= k_info_t *info) super_types[mddev->major_version]. validate_super(mddev, rdev); if ((info->state & (1<flags) || + (test_bit(Faulty, &rdev->flags) || rdev->raid_disk !=3D info->raid_disk)) { /* This was a hot-add request, but events doesn't * match, so reject it. >=20 > > > >> > >> Can you pls comment on my doubts? > > > > I think there is probably room for improvement here but I don't think t= here > > are any serious problems. > > > > However I'm about to go on leave for a couple of week so I'm unlikely to > > think about it for a while. I've made a note to look at it properly whe= n I > > get back. > > >=20 > Indeed, don't you think about this while you are resting! :-) Thanks. I did have a very enjoyable break. NeilBrown --Sig_/ethdYtXCGcnIU9+3go57jKV Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIVAwUBT+q00Dnsnt1WYoG5AQIEOA/8CULDjRqwnbjmocP0SeMBI8Do2bM0XkmH k9M8ziu+PNppvm1L1zrZ6GY1SWbqpwXLgYDMtNBnLuHGrAAihKL0pGDzesd+jJ8s rfTqKC/zotqBH2haKLciKLb7KyhWYFKUPt/tnwyipzKlavN59Sm2jpIvEqBYhWlK AgfdSN8EIysLjo4xaxfVPepE9m1u333ucdF5LEQcnc5kwq7KAYyIP7Uh0k3Jj98k KS9jiBHh7oGyDtevNV9oCXLUxjfBQtZFv17Sc3aZTfAXl1pJe/wuu/IEdVPwrsDE tp+piZddo2gHvHuEkWOGOSsvmnOQZ5Bhs9nHnXWhTUJXX0Z02Layou3xmtSEs2Bl ROUIbL3eJgbGTQvZLkyD8W7jxZu50mlGFNv6wNkYEdCxlutkbbZo8ZeXwFIzru10 gqkdWB7ghEWRUyaDnDBOJwtaPGU8ck+mwEBg9L8yfWOsGNhhiTBj2F2/jisrsL9S bR2Sf4408ggyaJ61nZuVvLIN1d09gaIqiJTB7Q2lsGxqSDxdOl+X/z8dtO3JEa9h kM6MuiLI3CG+n82KQYesK2n6Ym5h06cz2Zw1O4/qbbK8v/HrtinaccwXol2MPJL7 /ix0IpiT84+rRjtxfKe4eNCqO4Ph5HYlnzmqCbPws0kdtr+tCwFh0cGV0tsg7MzK qpvdq5f9EZU= =gg0/ -----END PGP SIGNATURE----- --Sig_/ethdYtXCGcnIU9+3go57jKV--