qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Redko <redkoi@virtuozzo.com>
To: Wen Congyang <wency@cn.fujitsu.com>, "Denis V. Lunev" <den@openvz.org>
Cc: Amit Shah <amit.shah@redhat.com>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock
Date: Tue, 29 Sep 2015 18:32:57 +0300	[thread overview]
Message-ID: <560AAF29.9030705@virtuozzo.com> (raw)
In-Reply-To: <56050489.9010306@cn.fujitsu.com>

On 25.09.2015 11:23, Wen Congyang wrote:
> On 09/25/2015 04:03 PM, Denis V. Lunev wrote:
>> On 09/25/2015 04:21 AM, Wen Congyang wrote:
>>> On 09/24/2015 08:53 PM, Denis V. Lunev wrote:
>>>> From: Igor Redko <redkoi@virtuozzo.com>
>>>>
>>>> Release qemu global mutex before call synchronize_rcu().
>>>> synchronize_rcu() waiting for all readers to finish their critical
>>>> sections. There is at least one critical section in which we try
>>>> to get QGM (critical section is in address_space_rw() and
>>>> prepare_mmio_access() is trying to aquire QGM).
>>>>
>>>> Both functions (migration_end() and migration_bitmap_extend())
>>>> are called from main thread which is holding QGM.
>>>>
>>>> Thus there is a race condition that ends up with deadlock:
>>>> main thread        working thread
>>>> Lock QGA                |
>>>> |             Call KVM_EXIT_IO handler
>>>> |                        |
>>>> |        Open rcu reader's critical section
>>>> Migration cleanup bh    |
>>>> |                       |
>>>> synchronize_rcu() is    |
>>>> waiting for readers     |
>>>> |            prepare_mmio_access() is waiting for QGM
>>>>     \                   /
>>>>            deadlock
>>>>
>>>> The patch just releases QGM before calling synchronize_rcu().
>>>>
>>>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>>>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Amit Shah <amit.shah@redhat.com>
>>>> ---
>>>>    migration/ram.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 7f007e6..d01febc 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -1028,12 +1028,16 @@ static void migration_end(void)
>>>>    {
>>>>        /* caller have hold iothread lock or is in a bh, so there is
>>>>         * no writing race against this migration_bitmap
>>>> +     * but rcu used not only for migration_bitmap, so we should
>>>> +     * release QGM or we get in deadlock.
>>>>         */
>>>>        unsigned long *bitmap = migration_bitmap;
>>>>        atomic_rcu_set(&migration_bitmap, NULL);
>>>>        if (bitmap) {
>>>>            memory_global_dirty_log_stop();
>>>> +        qemu_mutex_unlock_iothread();
>>>>            synchronize_rcu();
>>>> +        qemu_mutex_lock_iothread();
>>> migration_end() can called in two cases:
>>> 1. migration_completed
>>> 2. migration is cancelled
>>>
>>> In case 1, you should not unlock iothread, otherwise, the vm's state may be changed
>>> unexpectedly.
>>
>> sorry, but there is now very good choice here. We should either
>> unlock or not call synchronize_rcu which is also an option.
>>
>> In the other case the rework should be much more sufficient.
>
> I don't reproduce this bug. But according to your description, the bug only exists
> in case 2. Is it right?
>
When migration is successfully completed, VM has been already stopped 
before migration_end() is being called. VM must be running to reproduce 
this bug. So, yes bug exists only in case 2

FYI
To reproduce this bug you need 2 hosts with qemu+libvirt (host0 and 
host1) configured for migration.
0. Create VM on host0 and install centos7
1. Shutdown VM.
2. Start VM (virsh start <VM_name>) and right after that start migration 
to host1 (smth like 'virsh migrate --live --verbose <VM_name> 
"qemu+ssh://host1/system"')
3. Stop migration after ~1 sec (after migration process have been 
started, but before it completed. for example when you see "Migration: [ 
  5 %]")
Works for me 9/10
deadlock: no response from VM and no response from qemu monitor (for 
example 'virsh qemu-monitor-command --hmp <VM_NAME> "info migrate"' will 
hang indefinitely)

Another way:
0. Create VM with e1000 network card on host0 and install centos7
1. Run iperf on VM (or any other load on network)
2. Start migration
3. Stop migration before it completed.
For this approach e1000 network card is essential because it generates 
KVM_EXIT_MMIO.

>>
>> Den
>>
>>>>            g_free(bitmap);
>>>>        }
>>>>    @@ -1085,7 +1089,9 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
>>>>            atomic_rcu_set(&migration_bitmap, bitmap);
>>>>            qemu_mutex_unlock(&migration_bitmap_mutex);
>>>>            migration_dirty_pages += new - old;
>>>> +        qemu_mutex_unlock_iothread();
>>>>            synchronize_rcu();
>>>> +        qemu_mutex_lock_iothread();
>>> Hmm, I think it is OK to unlock iothread here
>>>
>>>>            g_free(old_bitmap);
>>>>        }
>>>>    }
>>>>
>>
>> .
>>
>
>

  parent reply	other threads:[~2015-09-29 15:53 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-24 12:53 [Qemu-devel] [PATCH 1/1] migration: fix deadlock Denis V. Lunev
2015-09-25  1:21 ` Wen Congyang
2015-09-25  8:03   ` Denis V. Lunev
2015-09-25  8:23     ` Wen Congyang
2015-09-25  9:09       ` [Qemu-devel] [PATCH v2 0/2] " Denis V. Lunev
2015-09-25  9:09         ` [Qemu-devel] [PATCH 1/2] migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0 Denis V. Lunev
2015-09-25  9:24           ` Wen Congyang
2015-09-25  9:31             ` Denis V. Lunev
2015-09-25  9:37               ` Wen Congyang
2015-09-25 10:05                 ` Denis V. Lunev
2015-09-25  9:09         ` [Qemu-devel] [PATCH 2/2] migration: fix deadlock Denis V. Lunev
2015-09-25  9:35           ` Wen Congyang
2015-09-25  9:46         ` [Qemu-devel] [PATCH v2 0/2] " Wen Congyang
2015-09-28 10:55           ` Igor Redko
2015-09-28 15:12             ` Igor Redko
2015-09-29  8:47             ` Dr. David Alan Gilbert
2015-09-30 14:28               ` Igor Redko
2015-09-29 15:32       ` Igor Redko [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-09-28 11:41 [Qemu-devel] [PATCH 1/1] " Denis V. Lunev
2015-09-28 11:55 ` Paolo Bonzini
2015-09-29  5:13 ` Amit Shah
2015-09-29  5:43   ` Denis V. Lunev
2015-09-29  5:46   ` Denis V. Lunev
2015-09-30 16:16 ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560AAF29.9030705@virtuozzo.com \
    --to=redkoi@virtuozzo.com \
    --cc=amit.shah@redhat.com \
    --cc=den@openvz.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wency@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).