From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43323)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1aukLZ-0001Zg-Vu
	for qemu-devel@nongnu.org; Mon, 25 Apr 2016 13:31:02 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1aukLW-0002kQ-Nr
	for qemu-devel@nongnu.org; Mon, 25 Apr 2016 13:30:57 -0400
Received: from mail-db3on0110.outbound.protection.outlook.com
	([157.55.234.110]:35762
	helo=emea01-db3-obe.outbound.protection.outlook.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1aukLW-0002kF-2v
	for qemu-devel@nongnu.org; Mon, 25 Apr 2016 13:30:54 -0400
References: <571E2049.2020706@virtuozzo.com> <20160425171850.GD2232@work-vm>
From: "Denis V. Lunev" <den@virtuozzo.com>
Message-ID: <571E5445.6050506@virtuozzo.com>
Date: Mon, 25 Apr 2016 20:30:45 +0300
MIME-Version: 1.0
In-Reply-To: <20160425171850.GD2232@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] long irresponsibility or stuck in the migration
 code
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>, QEMU <qemu-devel@nongnu.org>, Dmitry Mishin <dim@virtuozzo.com>

On 04/25/2016 08:18 PM, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@virtuozzo.com) wrote:
>> Hello, Amit!
>
> (That's unresponsiveness not irresponsibility :-)
;) thank you


>> We have faced very interesting issue with QEMU migration code.
>> Migration thread performs the following operation:
>>
>> #0  0x00007f61abe9978d in sendmsg () at ../sysdeps/unix/syscall-template.S:81
>> #1  0x00007f61b2942055 in do_send_recv (sockfd=sockfd@entry=104, iov=iov@entry=0x7f61b71a8030,
>>      iov_cnt=<optimized out>, do_send=do_send@entry=true) at util/iov.c:104
>> #2  0x00007f61b2942528 in iov_send_recv (sockfd=104, iov=iov@entry=0x7f61b71a8030, iov_cnt=iov_cnt@entry=1,
>>      offset=27532, offset@entry=0, bytes=5236, bytes@entry=32768, do_send=do_send@entry=true) at util/iov.c:181
>> #3  0x00007f61b287724a in socket_writev_buffer (opaque=0x7f61b6ec8070, iov=0x7f61b71a8030, iovcnt=1,
>>      pos=<optimized out>) at migration/qemu-file-unix.c:43
>> #4  0x00007f61b2875caa in qemu_fflush (f=f@entry=0x7f61b71a0000) at migration/qemu-file.c:109
>> #5  0x00007f61b2875e1a in qemu_put_buffer (f=0x7f61b71a0000, buf=buf@entry=0x7f61b662e030 "", size=size@entry=842)
>>      at migration/qemu-file.c:323
>> #6  0x00007f61b287674f in qemu_put_buffer (size=842, buf=0x7f61b662e030 "", f=0x7f61b2875caa <qemu_fflush+74>)
>> ---Type <return> to continue, or q <return> to quit---
>>      at migration/qemu-file.c:589
>> #7  qemu_put_qemu_file (f_des=f_des@entry=0x7f61b71a0000, f_src=0x7f61b662e000) at migration/qemu-file.c:589
>> #8  0x00007f61b26fab01 in compress_page_with_multi_thread (bytes_transferred=0x7f61b2dfe578 <bytes_transferred>,
>>      offset=2138677280, block=0x7f61b51e9b80, f=0x7f61b71a0000) at /usr/src/debug/qemu-2.3.0/migration/ram.c:872
>> #9  ram_save_compressed_page (bytes_transferred=0x7f61b2dfe578 <bytes_transferred>, last_stage=true,
>>      offset=2138677280, block=0x7f61b51e9b80, f=0x7f61b71a0000) at /usr/src/debug/qemu-2.3.0/migration/ram.c:957
>> #10 ram_find_and_save_block (f=f@entry=0x7f61b71a0000, last_stage=last_stage@entry=true,
>>      bytes_transferred=0x7f61b2dfe578 <bytes_transferred>) at /usr/src/debug/qemu-2.3.0/migration/ram.c:1015
>> #11 0x00007f61b26faed5 in ram_save_complete (f=0x7f61b71a0000, opaque=<optimized out>)
>>      at /usr/src/debug/qemu-2.3.0/migration/ram.c:1280
>> #12 0x00007f61b26ff241 in qemu_savevm_state_complete_precopy (f=0x7f61b71a0000,
>>      iterable_only=iterable_only@entry=false) at /usr/src/debug/qemu-2.3.0/migration/savevm.c:976
>> #13 0x00007f61b2872ecb in migration_completion (start_time=<synthetic pointer>, old_vm_running=<synthetic pointer>,
>>      current_active_state=<optimized out>, s=0x7f61b2d8bfc0 <current_migration.37181>) at migration/migration.c:1212
>> #14 migration_thread (opaque=0x7f61b2d8bfc0 <current_migration.37181>) at migration/migration.c:1307
>> #15 0x00007f61abe92dc5 in start_thread (arg=0x7f6117ff8700) at pthread_create.c:308
>> #16 0x00007f61abbc028d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:11
>>
>> which can take really BIG period of time.
>>
>> The problem is that we have taken qemu_global_mutex
>>
>> static void migration_completion(MigrationState *s, int current_active_state,
>>                                   bool *old_vm_running,
>>                                   int64_t *start_time)
>> {
>>      int ret;
>>
>>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>>          qemu_mutex_lock_iothread();
>>          *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>>          qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>          *old_vm_running = runstate_is_running();
>>          ret = global_state_store();
>>
>>          if (!ret) {
>>              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>              if (ret >= 0) {
>>                  qemu_file_set_rate_limit(s->file, INT64_MAX);
>>                  qemu_savevm_state_complete_precopy(s->file, false);
>>              }
>>          }
>>          qemu_mutex_unlock_iothread();
>>
>> and thus QEMU process is irresponsible for any management requests.
>> Here we have some misconfiguration and the data is not read, but
>> this could happen in other cases.
>>
>>  From my point of view we should drop qemu_mutex_unlock_iothread()
>> before any socket operation but doing this in a straight way (just
>> drop the lock) seems improper.
>>
>> Do you have any opinion on the problem?
> Yeh I've seen this before; it needs fixing but it's not obvious how.
> If you set the migration speed reasonably, that means that the
> amount of data sent at this point is a lot smaller so the send shouldn't
> take too long - however if the destination stalls at this point
> you're in trouble.
>
> The tricky bit is understanding exactly why we're holding the lock
> at this point; I can think of a few reasons but I'm not sure if it's
> all of them:
>     a) We don't want any hot-add/remove while we're trying to save the device
>        state.
>     b) We want to be able to stop the guest
>     c) We want to be able to stop any IO.
>
> I did suggest ( https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg01711.html )
> a lock free monitor would be nice where you could get some status
> commands and perhaps issue a migration_cancel;  but it sounds
> like it's messy untangling the monitor.
>
> If we knew all the reasons we were taking the lock there then
> perhaps we could split it into finer locks and let the monitor
> carry on; but I'm sure it's a huge task.
I have the same feeling. Thank you for a response.
At least I am not along here.

Den