All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Eric Blake <eblake@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed
Date: Tue, 16 Sep 2014 22:02:18 +1000	[thread overview]
Message-ID: <541826CA.7050607@ozlabs.ru> (raw)
In-Reply-To: <5416C46D.7040105@ozlabs.ru>

Hi!

I am having problems when migrate a guest via libvirt like this:

virsh migrate --live --persistent --undefinesource --copy-storage-all
--verbose --desturi qemu+ssh://legkvm/system --domain chig1

The XML used to create the guest is at the end of this mail.

I see NBD FLUSH command after the destination QEMU received EOF for
migration stream and this produces a crash in qcow2_co_flush_to_os() as
s->lock is false or s->l2_table_cache is NULL.


There are 2 scenarios I observe. First one is assert when
qemu_co_mutex_unlock(&s->lock) right before "return 0":

process_incoming_migration_co()
qcow2_invalidate_cache()
qcow2_close()
qcow2_cache_flush()
bdrv_flush()
bdrv_co_flush()
qemu_coroutine_yield()
nbd_trip() NBD_CMD_FLUSH
bdrv_co_flush()
qcow2_co_flush_to_os()

Second one is EOF is completely handled and next FLUSH crashes as
s->l2_table_cache == NULL.


Please help to understand what is going on here. Thanks!



Probably this is enough :) If it is not, I pushed debug branch to
git@github.com:aik/qemu.git
 * [new branch]      mig-dbg-legkvm -> mig-dbg


I added some traces and here is the log from the destination:

/home/alexey/p/qemu/nbd.c:nbd_trip():L1293: Request/Reply complete
+++Q+++ spapr_tce_pre_load 107
liobn 80000000 nb=262144 bus_off=0, shift=12, table=0x3fff915d0010
+++Q+++ (44846) qemu_loadvm_state 1000 EOF Received! section_type=0 (EOF=0)
+++Q+++ (44846) qemu_loadvm_state 1007
+++Q+++ (44846) qemu_loadvm_state 1016
+++Q+++ (44846) qcow2_close 1399 START
+++Q+++ (44846) qcow2_close 1404
_qcow2_cache_flush 0x10002b24030 0x100028e1390 - 0x100028df300 0x100028e1390
+++Q+++ (44846) bdrv_flush 5099
+++Q+++ (44846) bdrv_co_flush 4978
+++Q+++ (44846) bdrv_co_flush 4995
+++Q+++ (44846) bdrv_co_flush 4997
+++Q+++ (44846) bdrv_co_flush 5001
+++Q+++ (44846) bdrv_co_flush 5003
+++Q+++ (44846) bdrv_co_flush 5028
+++Q+++ (44846) bdrv_co_flush 4965
+++Q+++ (44846) bdrv_flush 5101
_qcow2_cache_flush 0x10002b24030 0x100028e13b0 - 0x100028df300 0x100028e1390
+++Q+++ (44846) bdrv_flush 5099
+++Q+++ (44846) bdrv_co_flush 4978
+++Q+++ (44846) bdrv_co_flush 4995
+++Q+++ (44846) bdrv_co_flush 4997
+++Q+++ (44846) bdrv_co_flush 5001
+++Q+++ (44846) bdrv_co_flush 5003
+++Q+++ (44846) bdrv_co_flush 5028
+++Q+++ (44846) bdrv_co_flush 4965
+++Q+++ (44846) bdrv_flush 5101
+++Q+++ (44846) qcow2_close 1409
+++Q+++ (44846) qcow2_close 1414
+++Q+++ (44846) qcow2_close 1422
+++Q+++ (44846) qcow2_close 1425 DONE!
+++Q+++ qcow2_invalidate_cache 1459
/home/alexey/p/qemu/nbd.c:nbd_trip():L1164: Reading request.
/home/alexey/p/qemu/nbd.c:nbd_receive_request():L785: Got request: { magic
= 0x25609513, .type = 65539, from = 0 , len = 0 }
/home/alexey/p/qemu/nbd.c:nbd_co_receive_request():L1130: Decoding type
/home/alexey/p/qemu/nbd.c:nbd_trip():L1257: Request type is FLUSH
+++Q+++ (44846) nbd_trip 1258 bs=0x100028db420 START
+++Q+++ (44846) qcow2_co_flush_to_os 2171
_qcow2_cache_flush 0x10002e5f030 (nil) - 0x100028df300 (nil)
2014-09-16 11:34:36.731+0000: shutting down

<here it crashed> as first (nil) is referenced by "c->size".




This is the sender:


+++Q+++ (67154) qemu_savevm_state_complete 747
+++Q+++ (67154) qemu_savevm_state_complete 749
+++Q+++ (67154) qemu_savevm_state_complete 751
+++Q+++ (67154) migration_thread 617
+++Q+++ (67154) migration_thread 628
+++Q+++ (67154) migration_thread 667
+++Q+++ (67154) migration_thread 684
+++Q+++ (67154) migration_thread 686
+++Q+++ (67154) migration_thread 688
+++Q+++ (67154) bdrv_flush 5099
+++Q+++ (67154) bdrv_co_flush 4978
+++Q+++ (67154) bdrv_co_flush 5028
+++Q+++ (67154) nbd_client_session_co_flush 305
/home/alexey/p/qemu/nbd.c:nbd_send_request():L739: Sending request to
client: { .from = 0, .le
n = 0, .handle = 1099744563680, .type=65539}
/home/alexey/p/qemu/nbd.c:nbd_receive_reply():L806: read failed



This is the XML:


[root@chikvm ~]# cat chig1-aik.xml
<domain type='kvm'>
  <name>chig1</name>
  <uuid>bbf91237-3c78-489e-b426-ab593806c78b</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='ppc64' machine='pseries'>hvm</type>
    <boot dev='hd'/>
    <boot dev='network'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-ppc64.aik</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/chig1.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:27:70:6f'/>
      <source bridge='brenP1p9s0f0'/>
      <driver name='qemu'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
      <address type='spapr-vio' reg='0x30000000'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
      <address type='spapr-vio' reg='0x30000000'/>
    </console>
    <video>
      <model type='vga' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'/>
</domain>



-- 
Alexey

  reply	other threads:[~2014-09-16 12:02 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-15 10:50 [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed Alexey Kardashevskiy
2014-09-16 12:02 ` Alexey Kardashevskiy [this message]
2014-09-16 12:10   ` Paolo Bonzini
2014-09-16 12:34     ` Kevin Wolf
2014-09-16 12:35       ` Paolo Bonzini
2014-09-16 12:52         ` Kevin Wolf
2014-09-16 12:59           ` Paolo Bonzini
2014-09-19  8:47             ` Kevin Wolf
2014-09-23  8:47               ` [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation Alexey Kardashevskiy
2014-09-24  7:30                 ` Alexey Kardashevskiy
2014-09-24  9:48                 ` Kevin Wolf
2014-09-25  8:41                   ` Alexey Kardashevskiy
2014-09-25  8:57                     ` Kevin Wolf
2014-09-25  9:55                       ` Alexey Kardashevskiy
2014-09-25 10:20                         ` Kevin Wolf
2014-09-25 12:29                           ` Alexey Kardashevskiy
2014-09-25 12:39                             ` Kevin Wolf
2014-09-25 14:05                               ` Alexey Kardashevskiy
2014-09-28 11:14                                 ` Alexey Kardashevskiy
2014-09-17  6:46       ` [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed Alexey Kardashevskiy
2014-09-16 14:52     ` Alexey Kardashevskiy
2014-09-17  9:06     ` Stefan Hajnoczi
2014-09-17  9:25       ` Paolo Bonzini
2014-09-17 13:44         ` Alexey Kardashevskiy
2014-09-17 15:07           ` Stefan Hajnoczi
2014-09-18  3:26             ` Alexey Kardashevskiy
2014-09-18  9:56               ` Paolo Bonzini
2014-09-19  8:23                 ` Alexey Kardashevskiy
2014-09-17 15:04         ` Stefan Hajnoczi
2014-09-17 15:17           ` Eric Blake
2014-09-17 15:53           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=541826CA.7050607@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.