qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: kwolf@redhat.com, jsnow@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage
Date: Wed, 28 Mar 2018 16:53:57 +0200	[thread overview]
Message-ID: <48dbaef0-0a9f-de65-9c7d-584021fc4759@redhat.com> (raw)
In-Reply-To: <2520cd8d-990b-e7b9-c7b6-0b345e414ce8@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 7248 bytes --]

On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote:
> 27.03.2018 12:53, Vladimir Sementsov-Ogievskiy wrote:
>> 27.03.2018 12:28, Vladimir Sementsov-Ogievskiy wrote:
>>> 26.03.2018 21:06, Max Reitz wrote:
>>>> On 2018-03-20 18:05, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Hi all.
>>>>>
>>>>> This fixes bitmaps migration through shared storage. Look at 02 for
>>>>> details.
>>>>>
>>>>> The bug introduced in 2.10 with the whole qcow2 bitmaps feature, so
>>>>> qemu-stable in CC. However I doubt that someone really suffered
>>>>> from this.
>>>>>
>>>>> Do we need dirty bitmaps at all in inactive case? - that was a
>>>>> question in v2.
>>>>> And, keeping in mind that we are going to use inactive mode not
>>>>> only for
>>>>> incoming migration, I'm not sure that answer is NO (but, it may be
>>>>> "NO" for
>>>>> 2.10, 2.11), so let's fix it in proposed here manner at least for
>>>>> 2.12.
>>>> For some reason, I can't get 169 to work now at all[1]. What's more,
>>>> whenever I run it, two (on current master, maybe more after this
>>>> series)
>>>> "cat $TEST_DIR/mig_file" processes stay around.  That doesn't seem
>>>> right.
>>>>
>>>> However, this series doesn't seem to make it worse[2]...  So I'm
>>>> keeping
>>>> it.  I suppose it's just some issue with the test.
>>>>
>>>> Max
>>>>
>>>>
>>>> [1] Sometimes there are migration even timeouts, sometimes just VM
>>>> launch timeouts (specifically when VM B is supposed to be re-launched
>>>> just after it has been shut down), and sometimes I get a dirty bitmap
>>>> hash mismatch.
>>>>
>>>>
>>>> [2] The whole timeline was:
>>>>
>>>> - Apply this series, everything seems alright
>>>>
>>>> (a couple of hours later)
>>>> - Test some other things, stumble over 169 once or so
>>>>
>>>> - Focus on 169, fails a bit more often
>>>>
>>>> (today)
>>>> - Can't get it to work at all
>>>>
>>>> - Can't get it to work in any version, neither before nor after this
>>>> patch
>>>>
>>>> - Lose my sanity
>>>>
>>>> - Write this email
>>>>
>>>> O:-)
>>>>
>>>
>>> hmm.. checked on current master (7b93d78a04aa24), tried a lot of
>>> times in a loop, works for me. How can I help?
>>>
>>
>> O, loop finally finished, with:
>>
>> 169 6s ... [failed, exit status 1] - output mismatch (see 169.out.bad)
>> --- /work/src/qemu/master/tests/qemu-iotests/169.out    2018-03-16
>> 21:01:19.536765587 +0300
>> +++ /work/src/qemu/master/tests/qemu-iotests/169.out.bad 2018-03-27
>> 12:33:03.804800350 +0300
>> @@ -1,5 +1,20 @@
>> -........
>> +......E.
>> +======================================================================
>> +ERROR: test__persistent__not_migbitmap__offline
>> (__main__.TestDirtyBitmapMigration)
>> +methodcaller(name, ...) --> methodcaller object
>> +----------------------------------------------------------------------
>> +Traceback (most recent call last):
>> +  File "169", line 129, in do_test_migration
>> +    self.vm_b.event_wait("RESUME", timeout=10.0)
>> +  File
>> "/work/src/qemu/master/tests/qemu-iotests/../../scripts/qemu.py", line
>> 349, in event_wait
>> +    event = self._qmp.pull_event(wait=timeout)
>> +  File
>> "/work/src/qemu/master/tests/qemu-iotests/../../scripts/qmp/qmp.py",
>> line 216, in pull_event
>> +    self.__get_events(wait)
>> +  File
>> "/work/src/qemu/master/tests/qemu-iotests/../../scripts/qmp/qmp.py",
>> line 124, in __get_events
>> +    raise QMPTimeoutError("Timeout waiting for event")
>> +QMPTimeoutError: Timeout waiting for event
>> +
>>  ----------------------------------------------------------------------
>>  Ran 8 tests
>>
>> -OK
>> +FAILED (errors=1)
>> Failures: 169
>> Failed 1 of 1 tests
>>
>>
>> and I have a lot of opened pipes, like:
>>
>> root       18685  0.0  0.0 107924   352 pts/0    S    12:19   0:00 cat
>> /work/src/qemu/master/tests/qemu-iotests/scratch/mig_file
>>
>> ...
>>
>> restart testing loop, it continues to pass 169 again and again...
>>
> 
> .... and,
> 
> --- /work/src/qemu/master/tests/qemu-iotests/169.out    2018-03-16
> 21:01:19.536765587 +0300
> +++ /work/src/qemu/master/tests/qemu-iotests/169.out.bad 2018-03-27
> 12:58:44.804894014 +0300
> @@ -1,5 +1,20 @@
> -........
> +F.......
> +======================================================================
> +FAIL: test__not_persistent__migbitmap__offline
> (__main__.TestDirtyBitmapMigration)
> +methodcaller(name, ...) --> methodcaller object
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "169", line 136, in do_test_migration
> +    self.check_bitmap(self.vm_b, sha256 if persistent else False)
> +  File "169", line 77, in check_bitmap
> +    "Dirty bitmap 'bitmap0' not found");
> +  File "/work/src/qemu/master/tests/qemu-iotests/iotests.py", line 422,
> in assert_qmp
> +    result = self.dictpath(d, path)
> +  File "/work/src/qemu/master/tests/qemu-iotests/iotests.py", line 381,
> in dictpath
> +    self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
> +AssertionError: failed path traversal for "error/desc" in "{u'return':
> {u'sha256':
> u'01d2ebedcb8f549a2547dbf8e231c410e3e747a9479e98909fc936e0035cf8b1'}}"
> +
>  ----------------------------------------------------------------------
>  Ran 8 tests
> 
> -OK
> +FAILED (failures=1)
> Failures: 169
> Failed 1 of 1 tests
> 
> 
> isn't it because a lot of cat processes? will check, update loop to
> i=0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat; done

Hmm...  I know I tried to kill all of the cats, but for some reason that
didn't really help yesterday.  Seems to help now, for 2.12.0-rc0 at
least (that is, before this series).

After the whole series, I still get a lot of failures in 169
(mismatching bitmap hash, mostly).

And interestingly, if I add an abort():

diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..9204c1c0ac 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1481,6 +1481,7 @@ static int coroutine_fn
qcow2_do_open(BlockDriverState *bs, QDict *options,     }

     if (bdrv_dirty_bitmap_next(bs, NULL)) {
+        abort();
         /* It's some kind of reopen with already existing dirty
bitmaps. There
          * are no known cases where we need loading bitmaps in such
situation,
          * so it's safer don't load them.

Then this fires for a couple of test cases of 169 even without the third
patch of this series.

I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration
adds or something?  Then this would be the wrong condition, because I
guess we still want to load the bitmaps that are in the qcow2 file.

I'm not sure whether bdrv_has_readonly_bitmaps() is the correct
condition then, either, though.  Maybe let's take a step back: We want
to load all the bitmaps from the file exactly once, and that is when it
is opened the first time.  Or that's what I would have thought...  Is
that even correct?

Why do we load the bitmaps when the device is inactive anyway?
Shouldn't we load them only once the device is activated?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-03-28 14:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-20 17:05 [Qemu-devel] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage Vladimir Sementsov-Ogievskiy
2018-03-20 17:05 ` [Qemu-devel] [PATCH v4 1/3] qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint() Vladimir Sementsov-Ogievskiy
2018-03-20 17:05 ` [Qemu-devel] [PATCH v4 2/3] qcow2: fix bitmaps loading when bitmaps already exist Vladimir Sementsov-Ogievskiy
2018-03-20 17:05 ` [Qemu-devel] [PATCH v4 2/3] qcow2: handle reopening bitmaps on bdrv_invalidate_cache Vladimir Sementsov-Ogievskiy
2018-03-20 17:07   ` [Qemu-devel] [PATCH v4 2/3] qcow2: handle reopening bitmaps on bdrv_invalidate_cache DROP IT Vladimir Sementsov-Ogievskiy
2018-03-20 17:05 ` [Qemu-devel] [PATCH v4 3/3] iotests: enable shared migration cases in 169 Vladimir Sementsov-Ogievskiy
2018-03-21 13:20 ` [Qemu-devel] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage Max Reitz
2018-03-26 18:06 ` Max Reitz
2018-03-27  9:28   ` Vladimir Sementsov-Ogievskiy
2018-03-27  9:53     ` Vladimir Sementsov-Ogievskiy
2018-03-27 10:11       ` Vladimir Sementsov-Ogievskiy
2018-03-28 14:53         ` Max Reitz [this message]
2018-03-29  8:08           ` Vladimir Sementsov-Ogievskiy
2018-03-29 14:03             ` Max Reitz
2018-03-29 15:09               ` Vladimir Sementsov-Ogievskiy
2018-03-30 13:31                 ` Vladimir Sementsov-Ogievskiy
2018-03-30 15:32                   ` Vladimir Sementsov-Ogievskiy
2018-04-03 16:03                     ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48dbaef0-0a9f-de65-9c7d-584021fc4759@redhat.com \
    --to=mreitz@redhat.com \
    --cc=den@openvz.org \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).