From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47334) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f1ZBN-0003PR-Cw for qemu-devel@nongnu.org; Thu, 29 Mar 2018 11:09:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f1ZBL-0001gA-Uc for qemu-devel@nongnu.org; Thu, 29 Mar 2018 11:09:41 -0400 References: <20180320170521.32152-1-vsementsov@virtuozzo.com> <4add3400-b4d4-4812-72f2-0f184b2f4fd6@virtuozzo.com> <2520cd8d-990b-e7b9-c7b6-0b345e414ce8@virtuozzo.com> <48dbaef0-0a9f-de65-9c7d-584021fc4759@redhat.com> <1870fc0d-0c36-f983-8496-3cb5119851ba@redhat.com> From: Vladimir Sementsov-Ogievskiy Message-ID: <50eaf53c-871e-6081-7c4d-4d8de9b0d87f@virtuozzo.com> Date: Thu, 29 Mar 2018 18:09:23 +0300 MIME-Version: 1.0 In-Reply-To: <1870fc0d-0c36-f983-8496-3cb5119851ba@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US Subject: Re: [Qemu-devel] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: kwolf@redhat.com, jsnow@redhat.com, den@openvz.org 29.03.2018 17:03, Max Reitz wrote: > On 2018-03-29 10:08, Vladimir Sementsov-Ogievskiy wrote: >> 28.03.2018 17:53, Max Reitz wrote: >>> On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote: > [...] > >>>> isn't it because a lot of cat processes? will check, update loop to >>>> i=3D0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat; >>>> done >>> Hmm...=C2=A0 I know I tried to kill all of the cats, but for some reaso= n that >>> didn't really help yesterday.=C2=A0 Seems to help now, for 2.12.0-rc0 a= t >>> least (that is, before this series). >> reproduced with killing... (without these series, just on master) >> >>> After the whole series, I still get a lot of failures in 169 >>> (mismatching bitmap hash, mostly). >>> >>> And interestingly, if I add an abort(): >>> >>> diff --git a/block/qcow2.c b/block/qcow2.c >>> index 486f3e83b7..9204c1c0ac 100644 >>> --- a/block/qcow2.c >>> +++ b/block/qcow2.c >>> @@ -1481,6 +1481,7 @@ static int coroutine_fn >>> qcow2_do_open(BlockDriverState *bs, QDict *options,=C2=A0=C2=A0=C2=A0= =C2=A0 } >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (bdrv_dirty_bitmap_next(bs, NULL)) { >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 abort(); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* It's some ki= nd of reopen with already existing dirty >>> bitmaps. There >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * are no = known cases where we need loading bitmaps in such >>> situation, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * so it's= safer don't load them. >>> >>> Then this fires for a couple of test cases of 169 even without the thir= d >>> patch of this series. >>> >>> I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration >>> adds or something?=C2=A0 Then this would be the wrong condition, becaus= e I >>> guess we still want to load the bitmaps that are in the qcow2 file. >>> >>> I'm not sure whether bdrv_has_readonly_bitmaps() is the correct >>> condition then, either, though.=C2=A0 Maybe let's take a step back: We = want >>> to load all the bitmaps from the file exactly once, and that is when it >>> is opened the first time.=C2=A0 Or that's what I would have thought...= =C2=A0 Is >>> that even correct? >>> >>> Why do we load the bitmaps when the device is inactive anyway? >>> Shouldn't we load them only once the device is activated? >> Hmm, not sure. May be, we don't need. But we anyway need to load them, >> when opening read-only, and we should correspondingly reopen in this cas= e. > Yeah, well, yeah, but the current state seems just wrong. Apparently > there are cases where a qcow2 node may have bitmaps before we try to > load them from the file, so the current condition doesn't work. > > Furthermore, it seems like the current "state machine" is too complex so > we don't know which cases are possible anymore and what to do when. > > So the first thing we could do is add a field to the BDRVQCow2State that > tells us whether the bitmaps have been loaded already or not. If not, > we invoke qcow2_load_dirty_bitmaps() and set the value to true. If the > value was true already and the BDS is active and R/W now, we call > qcow2_reopen_bitmaps_rw_hint(). That should solve one problem. good idea, will do. > > The other problem of course is the question whether we should call > qcow2_load_dirty_bitmaps() at all while the drive is still inactive. > You know the migration model better than me, so I'm asking this question > to you. We can phrase it differently: Do we need to load the bitmaps > before the drive is activated? Now I think that we don't need. At least, we don't have such cases in=20 Virtuozzo (I hope :). Why did I doubt: 1. We have cases, when we want to start vm as inactive, to be able to=20 export it's drive as NBD export, push some data to it and then start the=20 VM (which involves activating) 2. We have cases, when we want to start vm stopped and operate on dirty=20 bitmaps. If just open all images in inactive mode until vm start, it looks like=20 we need bitmaps in inactive mode (for 2.). But it looks like wrong=20 approach anyway. Firstly, I tried to solve (1.) by simply inactivate_all() in case of=20 start vm in paused mode, but it breaks at least (2.), so finally, I=20 solved (1.) by an approach similar with "-incoming defer". So, we have=20 inactive mode in two cases: =C2=A0- incoming migration =C2=A0- push data to vm before start and, in these cases, we don't need to load dirty-bitmaps. Also, inconsistency: now, we remove persistent bitmaps on inactivate.=20 So, it is inconsistent to load the in inactive mode. Ok, I'll try to respin. > > Max > about 169, how often is it reproducible for you? --=20 Best regards, Vladimir