From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33254) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1augsw-0007Ji-20 for qemu-devel@nongnu.org; Mon, 25 Apr 2016 09:49:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1augsq-0007B3-V7 for qemu-devel@nongnu.org; Mon, 25 Apr 2016 09:49:09 -0400 Received: from mail-am1on0109.outbound.protection.outlook.com ([157.56.112.109]:20384 helo=emea01-am1-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1augsq-0007Av-Ei for qemu-devel@nongnu.org; Mon, 25 Apr 2016 09:49:04 -0400 From: "Denis V. Lunev" Message-ID: <571E2049.2020706@virtuozzo.com> Date: Mon, 25 Apr 2016 16:48:57 +0300 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] long irresponsibility or stuck in the migration code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Amit Shah , QEMU Cc: Dmitry Mishin Hello, Amit! We have faced very interesting issue with QEMU migration code. Migration thread performs the following operation: #0 0x00007f61abe9978d in sendmsg () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f61b2942055 in do_send_recv (sockfd=sockfd@entry=104, iov=iov@entry=0x7f61b71a8030, iov_cnt=, do_send=do_send@entry=true) at util/iov.c:104 #2 0x00007f61b2942528 in iov_send_recv (sockfd=104, iov=iov@entry=0x7f61b71a8030, iov_cnt=iov_cnt@entry=1, offset=27532, offset@entry=0, bytes=5236, bytes@entry=32768, do_send=do_send@entry=true) at util/iov.c:181 #3 0x00007f61b287724a in socket_writev_buffer (opaque=0x7f61b6ec8070, iov=0x7f61b71a8030, iovcnt=1, pos=) at migration/qemu-file-unix.c:43 #4 0x00007f61b2875caa in qemu_fflush (f=f@entry=0x7f61b71a0000) at migration/qemu-file.c:109 #5 0x00007f61b2875e1a in qemu_put_buffer (f=0x7f61b71a0000, buf=buf@entry=0x7f61b662e030 "", size=size@entry=842) at migration/qemu-file.c:323 #6 0x00007f61b287674f in qemu_put_buffer (size=842, buf=0x7f61b662e030 "", f=0x7f61b2875caa ) ---Type to continue, or q to quit--- at migration/qemu-file.c:589 #7 qemu_put_qemu_file (f_des=f_des@entry=0x7f61b71a0000, f_src=0x7f61b662e000) at migration/qemu-file.c:589 #8 0x00007f61b26fab01 in compress_page_with_multi_thread (bytes_transferred=0x7f61b2dfe578 , offset=2138677280, block=0x7f61b51e9b80, f=0x7f61b71a0000) at /usr/src/debug/qemu-2.3.0/migration/ram.c:872 #9 ram_save_compressed_page (bytes_transferred=0x7f61b2dfe578 , last_stage=true, offset=2138677280, block=0x7f61b51e9b80, f=0x7f61b71a0000) at /usr/src/debug/qemu-2.3.0/migration/ram.c:957 #10 ram_find_and_save_block (f=f@entry=0x7f61b71a0000, last_stage=last_stage@entry=true, bytes_transferred=0x7f61b2dfe578 ) at /usr/src/debug/qemu-2.3.0/migration/ram.c:1015 #11 0x00007f61b26faed5 in ram_save_complete (f=0x7f61b71a0000, opaque=) at /usr/src/debug/qemu-2.3.0/migration/ram.c:1280 #12 0x00007f61b26ff241 in qemu_savevm_state_complete_precopy (f=0x7f61b71a0000, iterable_only=iterable_only@entry=false) at /usr/src/debug/qemu-2.3.0/migration/savevm.c:976 #13 0x00007f61b2872ecb in migration_completion (start_time=, old_vm_running=, current_active_state=, s=0x7f61b2d8bfc0 ) at migration/migration.c:1212 #14 migration_thread (opaque=0x7f61b2d8bfc0 ) at migration/migration.c:1307 #15 0x00007f61abe92dc5 in start_thread (arg=0x7f6117ff8700) at pthread_create.c:308 #16 0x00007f61abbc028d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:11 which can take really BIG period of time. The problem is that we have taken qemu_global_mutex static void migration_completion(MigrationState *s, int current_active_state, bool *old_vm_running, int64_t *start_time) { int ret; if (s->state == MIGRATION_STATUS_ACTIVE) { qemu_mutex_lock_iothread(); *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); *old_vm_running = runstate_is_running(); ret = global_state_store(); if (!ret) { ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret >= 0) { qemu_file_set_rate_limit(s->file, INT64_MAX); qemu_savevm_state_complete_precopy(s->file, false); } } qemu_mutex_unlock_iothread(); and thus QEMU process is irresponsible for any management requests. Here we have some misconfiguration and the data is not read, but this could happen in other cases. From my point of view we should drop qemu_mutex_unlock_iothread() before any socket operation but doing this in a straight way (just drop the lock) seems improper. Do you have any opinion on the problem? Den