From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50577) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTTrj-0003wb-6Y for qemu-devel@nongnu.org; Mon, 15 Sep 2014 06:50:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XTTrb-0000b8-G8 for qemu-devel@nongnu.org; Mon, 15 Sep 2014 06:50:39 -0400 Received: from mail-pd0-f178.google.com ([209.85.192.178]:35766) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTTra-0000as-SS for qemu-devel@nongnu.org; Mon, 15 Sep 2014 06:50:31 -0400 Received: by mail-pd0-f178.google.com with SMTP id p10so5958516pdj.9 for ; Mon, 15 Sep 2014 03:50:28 -0700 (PDT) Message-ID: <5416C46D.7040105@ozlabs.ru> Date: Mon, 15 Sep 2014 20:50:21 +1000 From: Alexey Kardashevskiy MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "qemu-devel@nongnu.org" , Stefan Hajnoczi Cc: "Dr. David Alan Gilbert" Hi! I am hitting an racing issue with migration. I migrate a guest from one machine to another using libvirt: virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose --desturi qemu+ssh://legkvm/system --domain chig1 I.e. it copied the full disk which is qcow2, 20G virtual, 10GB of real disk space. When migration finished, process_incoming_migration_co() calls bdrv_invalidate_cache_all() which calls qcow2_invalidate_cache() which does qcow2_close() and the latter destroys l2_table_cache and refcount_block_cache. It also calls qcow2_cache_flush(). All good. However somehow after migration is completed as described above, qcow2_co_flush_to_os() is called again and I either get crash in qcow2_cache_flush (as @c==NULL) or I get assert like below as s->lock is not set, the backtrace is below. In qcow2_co_flush_to_os(), @bs points to valid data and @s is empty. The xml I used is at the end of this email. But it does not seem essential for the problem - it only happens on two selected machines and I cannot reproduce it on the machines I got locally. It sounds to me like qcow2_close() is called when there is still a qcow2_co_flush_to_os() "coroutine" in flight (there is always one, I believe). Why is this happening and how to fix it properly? Thanks. Program received signal SIGSEGV, Segmentation fault. 0x0000000010504204 in qcow2_cache_flush (bs=0x10019aab420, c=0x0) at /home/alexey/p/qemu/block/qcow2-cache.c:174 174 for (i = 0; i < c->size; i++) { (gdb) bt #0 0x0000000010504204 in qcow2_cache_flush (bs=0x10019aab420, c=0x0) at /home/alexey/p/qemu/block/qcow2-cache.c:174 #1 0x00000000104f557c in qcow2_co_flush_to_os (bs=0x10019aab420) at /home/alexey/p/qemu/block/qcow2.c:2162 #2 0x00000000104c126c in bdrv_co_flush (bs=0x10019aab420) at /home/alexey/p/qemu/block.c:4971 #3 0x00000000104b2000 in nbd_trip (opaque=0x10019cf75c0) at /home/alexey/p/qemu/nbd.c:1259 #4 0x00000000104d17d4 in coroutine_trampoline (i0=0x100, i1=0x19cd5c00) at /home/alexey/p/qemu/coroutine-ucontext.c:118 #5 0x00003fff94ff099c in .__makecontext () from /usr/lib64/libc.so.6 #6 0x0eeabf8ea4edc5a2 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) up #1 0x00000000104f557c in qcow2_co_flush_to_os (bs=0x10019aab420) at /home/alexey/p/qemu/block/qcow2.c:2162 2162 ret = qcow2_cache_flush(bs, s->l2_table_cache); (gdb) p s $1 = (BDRVQcowState *) 0x10019aaf300 (gdb) p s->l2_table_cache $2 = (Qcow2Cache *) 0x0 (gdb) p *s $3 = { cluster_bits = 0x0, cluster_size = 0x0, cluster_sectors = 0x0, l2_bits = 0x0, l2_size = 0x0, l1_size = 0x0, l1_vm_state_index = 0x0, csize_shift = 0x0, csize_mask = 0x0, cluster_offset_mask = 0x0, l1_table_offset = 0x0, l1_table = 0x0, l2_table_cache = 0x0, refcount_block_cache = 0x0, cluster_cache = 0x0, cluster_data = 0x0, cluster_cache_offset = 0x0, cluster_allocs = { lh_first = 0x0 }, refcount_table = 0x0, refcount_table_offset = 0x0, refcount_table_size = 0x0, free_cluster_index = 0x0, free_byte_offset = 0x0, lock = { locked = 0x1, queue = { entries = { tqh_first = 0x0, tqh_last = 0x0 } } }, crypt_method = 0x0, crypt_method_header = 0x0, aes_encrypt_key = { rd_key = {0x0 }, rounds = 0x0 }, aes_decrypt_key = { rd_key = {0x0 }, rounds = 0x0 }, snapshots_offset = 0x0, snapshots_size = 0x0, nb_snapshots = 0x0, snapshots = 0x0, flags = 0x0, qcow_version = 0x0, use_lazy_refcounts = 0x0, refcount_order = 0x0, discard_passthrough = {0x0, 0x0, 0x0, 0x0, 0x0}, overlap_check = 0x0, incompatible_features = 0x0, compatible_features = 0x0, autoclear_features = 0x0, unknown_header_fields_size = 0x0, unknown_header_fields = 0x0, unknown_header_ext = { lh_first = 0x0 }, discards = { tqh_first = 0x0, tqh_last = 0x0 }, cache_discards = 0x0 } (gdb) p *bs $4 = { total_sectors = 0x2800000, read_only = 0x0, open_flags = 0x2062, encrypted = 0x0, valid_key = 0x0, sg = 0x0, copy_on_read = 0x0, drv = 0x1078d440 , opaque = 0x10019aaf300, dev = 0x10019a90b38, dev_ops = 0x105b49b0 , dev_opaque = 0x10019a90b38, aio_context = 0x10019a7f270, aio_notifiers = { lh_first = 0x10019cf74f0 }, filename = "/var/lib/libvirt/images/chig1.qcow2", backing_file = "", backing_format = "", full_open_options = 0x10019c2a030, exact_filename = "/var/lib/libvirt/images/chig1.qcow2", backing_hd = 0x0, file = 0x10019aae3b0, close_notifiers = { notifiers = { lh_first = 0x10019cf6460 } }, before_write_notifiers = { notifiers = { lh_first = 0x0 } }, serialising_in_flight = 0x0, throttle_state = { cfg = { buckets = {{ avg = 0, max = 0, level = 0 }, { avg = 0, max = 0, level = 0 }, { avg = 0, max = 0, level = 0 }, { avg = 0, max = 0, level = 0 }, { avg = 0, max = 0, level = 0 }, { avg = 0, max = 0, level = 0 }}, op_size = 0x0 }, previous_leak = 0x0, timers = {0x0, 0x0}, clock_type = QEMU_CLOCK_REALTIME, read_timer_cb = 0x0, write_timer_cb = 0x0, timer_opaque = 0x0 }, throttled_reqs = {{ entries = { tqh_first = 0x0, tqh_last = 0x10019aac188 } }, { entries = { tqh_first = 0x0, tqh_last = 0x10019aac198 } }}, io_limits_enabled = 0x0, nr_bytes = {0x0, 0x0, 0x0}, nr_ops = {0x0, 0x0, 0x0}, total_time_ns = {0x0, 0x0, 0x0}, wr_highest_sector = 0x27fffff, bl = { max_discard = 0x0, discard_alignment = 0x0, max_write_zeroes = 0x0, write_zeroes_alignment = 0x80, opt_transfer_length = 0x0, opt_mem_alignment = 0x1000 }, growable = 0x0, zero_beyond_eof = 0x1, request_alignment = 0x200, guest_block_size = 0x200, enable_write_cache = 0x1, on_read_error = BLOCKDEV_ON_ERROR_REPORT, on_write_error = BLOCKDEV_ON_ERROR_ENOSPC, iostatus_enabled = 0x1, iostatus = BLOCK_DEVICE_IO_STATUS_OK, node_name = "", node_list = { tqe_next = 0x0, tqe_prev = 0x0 }, device_name = "drive-virtio-disk0", device_list = { tqe_next = 0x0, tqe_prev = 0x1078b328 }, dirty_bitmaps = { lh_first = 0x0 }, refcnt = 0x2, tracked_requests = { lh_first = 0x0 }, op_blockers = {{ lh_first = 0x0 } }, job = 0x0, options = 0x10019aa9db0, detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, backing_blocker = 0x0 } [root@chikvm ~]# cat chig1-aik.xml chig1 bbf91237-3c78-489e-b426-ab593806c78b 4194304 4194304 1 /machine hvm destroy restart restart /usr/bin/qemu-system-ppc64.aik