From: "Denis V. Lunev" <den@openvz.org>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>,
Peter Maydell <peter.maydell@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu list <qemu-devel@nongnu.org>,
Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PULL 2/5] migration: move bdrv_invalidate_cache_all of of coroutine context
Date: Tue, 8 Mar 2016 13:54:33 +0300 [thread overview]
Message-ID: <56DEAF69.1070607@openvz.org> (raw)
In-Reply-To: <20160308104558.GA2251@work-vm>
On 03/08/2016 01:45 PM, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@openvz.org) wrote:
>> On 03/07/2016 03:49 PM, Dr. David Alan Gilbert wrote:
>>> * Amit Shah (amit.shah@redhat.com) wrote:
>>>> From: "Denis V. Lunev" <den@openvz.org>
>>>>
>>>> There is a possibility to hit an assert in qcow2_get_specific_info that
>>>> s->qcow_version is undefined. This happens when VM in starting from
>>>> suspended state, i.e. it processes incoming migration, and in the same
>>>> time 'info block' is called.
>>>>
>>>> The problem is that qcow2_invalidate_cache() closes the image and
>>>> memset()s BDRVQcowState in the middle.
>>>>
>>>> The patch moves processing of bdrv_invalidate_cache_all out of
>>>> coroutine context for postcopy migration to avoid that. This function
>>>> is called with the following stack:
>>>> process_incoming_migration_co
>>>> qemu_loadvm_state
>>>> qemu_loadvm_state_main
>>>> loadvm_process_command
>>>> loadvm_postcopy_handle_run
>>>>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> hmm; actually - this segs in a variety of different ways;
>>> there are two problems:
>>>
>>> a) + bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
>>> That's the easy one; that NULL should be 'mis', because
>>> the bh is expecting to use it as a MigrationIncomingState
>>> so it segs fairly reliably in the qemu_bh_delete(mis->bh)
>>>
>>> b) The harder problem is that there's a race where qemu_bh_delete
>>> segs, and I'm not 100% sure why yet - it only does it sometime
>>> (i.e. run virt-test and leave it and it occasionally does it).
>>> From the core it looks like qemu->bh is corrupt (0x10101010...)
>>> so maybe mis has been freed at that point?
>>> I'm suspecting this is the postcopy_ram_listen_thread freeing
>>> mis at the end of it, but I don't know yet.
>>>
>>> Dave
>> Yes. this is exactly use-after-free. I have looked into the code
>> and this seems correct.
>>
>> Could you try this simple patch?
> Hmm no, that's not right.
> The order for postcopy is that we are running the listen thread and then
> receive the 'run', and the listening thread is still running - so you
> can't destroy the incoming state during the run.
> It can't get destroyed until both the main thread has finished loading
> the migration AND the listen thread has finished.
>
> Hmm - that does give me an idea about the other seg I saw; I need to check it;
> but I think the problem is probably the case of a very short postcopy
> where the listen thread exits before the handle_run_bh is triggered;
> (and since I've only seen it in my virt-test setup, and I know it can do
> very short postcopies)
> I think the fix here is to pass loadvm_postcopy_handle_run_bh a pointer to it's
> own bh structure rather than store it in mis->bh; that way it doesn't use mis
> at all.
>
> Dave
>
>> Den
>>
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 96e7db5..9a020ef 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1446,15 +1446,6 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>
>> migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> MIGRATION_STATUS_COMPLETED);
>> - /*
>> - * If everything has worked fine, then the main thread has waited
>> - * for us to start, and we're the last use of the mis.
>> - * (If something broke then qemu will have to exit anyway since it's
>> - * got a bad migration state).
>> - */
>> - migration_incoming_state_destroy();
>> -
>> -
>> return NULL;
>> }
>>
>> @@ -1533,6 +1524,14 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>> }
>>
>> qemu_bh_delete(mis->bh);
>> +
>> + /*
>> + * If everything has worked fine, then the main thread has waited
>> + * for us to start, and we're the last use of the mis.
>> + * (If something broke then qemu will have to exit anyway since it's
>> + * got a bad migration state).
>> + */
>> + migration_incoming_state_destroy();
>> }
>>
>> /* After all discards we can start running and asking for pages */
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
This will help for sure. The idea to reuse migration state seems wrong.
Den
next prev parent reply other threads:[~2016-03-08 10:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-23 7:30 [Qemu-devel] [PULL 0/5] migration pull Amit Shah
2016-02-23 7:30 ` [Qemu-devel] [PULL 1/5] migration: move bdrv_invalidate_cache_all of of coroutine context Amit Shah
2016-02-23 7:30 ` [Qemu-devel] [PULL 2/5] " Amit Shah
2016-03-07 12:49 ` Dr. David Alan Gilbert
2016-03-07 13:30 ` Paolo Bonzini
2016-03-07 18:06 ` Dr. David Alan Gilbert
2016-03-07 18:58 ` Denis V. Lunev
2016-03-08 10:45 ` Dr. David Alan Gilbert
2016-03-08 10:54 ` Denis V. Lunev [this message]
2016-02-23 7:30 ` [Qemu-devel] [PULL 3/5] migration: reorder code to make it symmetric Amit Shah
2016-02-23 7:30 ` [Qemu-devel] [PULL 4/5] configure: detect ifunc and avx2 attribute Amit Shah
2016-02-23 7:30 ` [Qemu-devel] [PULL 5/5] cutils: add avx2 instruction optimization Amit Shah
2016-02-23 9:09 ` [Qemu-devel] [PULL 0/5] migration pull Peter Maydell
2016-02-23 9:38 ` Amit Shah
2016-02-23 9:48 ` Paolo Bonzini
2016-02-23 10:43 ` Peter Maydell
2016-02-23 11:18 ` Li, Liang Z
2016-02-23 11:25 ` Peter Maydell
2016-02-23 14:04 ` Paolo Bonzini
2016-02-24 9:27 ` Li, Liang Z
2016-03-08 4:23 ` Amit Shah
2016-03-08 4:28 ` Li, Liang Z
2016-02-23 9:55 ` Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56DEAF69.1070607@openvz.org \
--to=den@openvz.org \
--cc=amit.shah@redhat.com \
--cc=dgilbert@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).