From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Marzinski Subject: Re: [PATCH 2/7] multipathd: fix check_path errors with removed map Date: Fri, 19 Jun 2020 11:52:36 -0500 Message-ID: <20200619165236.GP5894@octiron.msp.redhat.com> References: <1592439867-18427-1-git-send-email-bmarzins@redhat.com> <1592439867-18427-3-git-send-email-bmarzins@redhat.com> <20200618231711.GM5894@octiron.msp.redhat.com> <6f9284eeec0dda69e8e4dd4b2858f02d3c31fb8b.camel@suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <6f9284eeec0dda69e8e4dd4b2858f02d3c31fb8b.camel@suse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com Content-Disposition: inline To: Martin Wilck Cc: "dm-devel@redhat.com" List-Id: dm-devel.ids On Fri, Jun 19, 2020 at 01:42:47PM +0000, Martin Wilck wrote: > On Thu, 2020-06-18 at 18:17 -0500, Benjamin Marzinski wrote: > > On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote: > > > On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote: > > > >=20 > > > > static void > > > > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct > > > > path > > > > * pp, unsigned int ticks) > > > > =09 * Synchronize with kernel state > > > > =09 */ > > > > =09if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) { > > > > +=09=09struct dm_info info; > > > > =09=09condlog(1, "%s: Could not synchronize with kernel > > > > state", > > > > =09=09=09pp->dev); > > > > +=09=09if (pp->mpp && pp->mpp->alias && > > > > +=09=09 do_dm_get_info(pp->mpp->alias, &info) =3D=3D 0) > > > > +=09=09=09/* multipath device missing. Likely removed */ > > > > +=09=09=09return 0; > > > > =09=09pp->dmstate =3D PSTATE_UNDEF; > > >=20 > > > It would be more elegant if we could distinguish different failure > > > modes from update_multipath_strings() directly, without having to > > > call > > > do_dm_get_info() again. > >=20 > > Seems reasonable. I'll take a look at that. >=20 > Yet another idea: I just discussed this with Hannes, and he pointed out > that right below this code, we have >=20 > =09/* if update_multipath_strings orphaned the path, quit early */ > =09if (!pp->mpp) > =09=09return 0; >=20 > ... which could have the same effect, if pp->mpp was reloaded. Probably > that doesn't happen because the pp->mpp dereference is cached in a > register, but we could simply add a READ_ONCE there. When update_multipath_strings() calls update_multipath_table() it will fail because the table no longer exists. If we differentiate between a dm error and not finding the map, we can exit early without having to call do_dm_get_info() again. But right now, if the map is gone, but we haven't got the uevent removing it, then nothing will clear pp->mpp. If we did get the uevent, then it must have grabbed the vecs lock. That better have caused a memory barrier, which will guarantee that check_path() sees the updated value. -Ben =20 > Choose what you prefer. >=20 > Regards, > Martin >=20 > --=20 > Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 > SUSE Software Solutions Germany GmbH > HRB 36809, AG N=FCrnberg GF: Felix > Imend=F6rffer >=20