qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Juan Quintela <quintela@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] postcopy migration hangs while loading virtio state
Date: Mon, 24 Apr 2017 11:53:45 +0100	[thread overview]
Message-ID: <20170424105344.GF2362@work-vm> (raw)
In-Reply-To: <70ef6f1f-27f4-133f-ab33-03a30f19867b@de.ibm.com>

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> David, Juan,
> 
> I can trigger a hang of postcopy migration (when I do it early) so that
> both sides are in paused state. Looks like one thread is still loading
> vmstates for virtio and this load accesses guest memory and triggers
> a userfault. 

It's perfectly legal for the destination to cause a userfault as it loads the
virtio queue - the virtio queue should be being loaded by the main thread
from the buffer while the 'listening' thread is waiting for the incoming
page data.

Can you turn on the following tracing please:
  destination:
    postcopy_ram_fault_thread_request
    postcopy_place_page
    postcopy_place_page_zero

  source:
    migrate_handle_rp_req_pages
    ram_save_queue_pages

You should see:
    virtio does the access
    userfault generates a fault
    postcopy_ram_fault_thread_request sends the request to the source

    the source sees migrate_handle_rp_req_pages
    queues it with ram_save_queue_pages

    the destination sees the page arrive and
    postcopy_place_page or postcopy_place_page_zero

some of that might disappear if the page was already on it's way.

Dave

> Thread 1 (Thread 0x3ffa2f45f00 (LWP 21122)):
> #0  0x0000000001017130 in lduw_he_p (ptr=0x3ff498d9002) at /root/qemu/include/qemu/bswap.h:317
> 		<---- according to the host kernel this threads hangs in handle_userfault.
> #1  0x0000000001017342 in lduw_le_p (ptr=0x3ff498d9002) at /root/qemu/include/qemu/bswap.h:359
> #2  0x0000000001025840 in address_space_lduw_internal_cached (cache=0x283491d0, addr=2, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /root/qemu/memory_ldst.inc.c:284
> #3  0x00000000010259a6 in address_space_lduw_le_cached (cache=0x283491d0, addr=2, attrs=..., result=0x0) at /root/qemu/memory_ldst.inc.c:315
> #4  0x0000000001025ad6 in lduw_le_phys_cached (cache=0x283491d0, addr=2) at /root/qemu/memory_ldst.inc.c:334
> #5  0x0000000001116c10 in virtio_lduw_phys_cached (vdev=0x28275090, cache=0x283491d0, pa=2) at /root/qemu/include/hw/virtio/virtio-access.h:166
> #6  0x0000000001117940 in vring_used_idx (vq=0x3ffa2e1c090) at /root/qemu/hw/virtio/virtio.c:263
> #7  0x000000000111daea in virtio_load (vdev=0x28275090, f=0x28353600, version_id=2) at /root/qemu/hw/virtio/virtio.c:2168
> #8  0x000000000111d0cc in virtio_device_get (f=0x28353600, opaque=0x28275090, size=0, field=0x16adf38 <__compound_literal.0>) at /root/qemu/hw/virtio/virtio.c:1975
> #9  0x00000000012a7f50 in vmstate_load_state (f=0x28353600, vmsd=0x14db480 <vmstate_virtio_blk>, opaque=0x28275090, version_id=2) at /root/qemu/migration/vmstate.c:128
> #10 0x00000000010cbc08 in vmstate_load (f=0x28353600, se=0x28279e60, version_id=2) at /root/qemu/migration/savevm.c:732
> #11 0x00000000010ced6e in qemu_loadvm_section_start_full (f=0x28353600, mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1890
> #12 0x00000000010cf00c in qemu_loadvm_state_main (f=0x28353600, mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1947
> #13 0x00000000010ce5bc in loadvm_handle_cmd_packaged (mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1707
> #14 0x00000000010ce87e in loadvm_process_command (f=0x282c7d20) at /root/qemu/migration/savevm.c:1770
> #15 0x00000000010cf056 in qemu_loadvm_state_main (f=0x282c7d20, mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1960
> #16 0x00000000010cf2b6 in qemu_loadvm_state (f=0x282c7d20) at /root/qemu/migration/savevm.c:2020
> #17 0x000000000129d234 in process_incoming_migration_co (opaque=0x282c7d20) at /root/qemu/migration/migration.c:404
> #18 0x00000000014b6ca2 in coroutine_trampoline (i0=0, i1=674533168) at /root/qemu/util/coroutine-ucontext.c:79
> #19 0x000003ff8de530ea in __makecontext_ret () from target:/lib64/libc.so.6
> 
> Thread 8 (Thread 0x3ff40b38910 (LWP 21165)):
> #0  0x000003ff8df04336 in syscall () from target:/lib64/libc.so.6
> #1  0x00000000014933ce in qemu_futex_wait (f=0x17169b0 <mis_current+8>, val=4294967295) at /root/qemu/include/qemu/futex.h:26
> #2  0x00000000014935d0 in qemu_event_wait (ev=0x17169b0 <mis_current+8>) at /root/qemu/util/qemu-thread-posix.c:399
> #3  0x00000000010ce094 in postcopy_ram_listen_thread (opaque=0x282c7d20) at /root/qemu/migration/savevm.c:1532
> #4  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #5  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> Thread 7 (Thread 0x3ff41338910 (LWP 21164)):
> #0  0x000003ff8defd72c in ?? () from target:/lib64/libc.so.6
> #1  0x00000000012af920 in postcopy_ram_fault_thread (opaque=0x17169a8 <mis_current>) at /root/qemu/migration/postcopy-ram.c:437
> #2  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #3  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> Thread 6 (Thread 0x3ff41b38910 (LWP 21163)):
> #0  0x000003ff8e00d450 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
> #1  0x0000000001492d66 in qemu_cond_wait (cond=0x28302408, mutex=0x283023e0) at /root/qemu/util/qemu-thread-posix.c:133
> #2  0x00000000010c6080 in do_data_decompress (opaque=0x283023d8) at /root/qemu/migration/ram.c:2277
> #3  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #4  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> Thread 5 (Thread 0x3ff42338910 (LWP 21162)):
> #0  0x000003ff8e00d450 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
> #1  0x0000000001492d66 in qemu_cond_wait (cond=0x28302390, mutex=0x28302368) at /root/qemu/util/qemu-thread-posix.c:133
> #2  0x00000000010c6080 in do_data_decompress (opaque=0x28302360) at /root/qemu/migration/ram.c:2277
> #3  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #4  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> Thread 4 (Thread 0x3ff89287910 (LWP 21148)):
> #0  0x000003ff8e010abc in __lll_lock_wait () from target:/lib64/libpthread.so.0
> #1  0x000003ff8e00bcb6 in __pthread_mutex_cond_lock () from target:/lib64/libpthread.so.0
> #2  0x000003ff8e00d4ee in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
> #3  0x0000000001492d66 in qemu_cond_wait (cond=0x2831eb00, mutex=0x1711688 <qemu_global_mutex>) at /root/qemu/util/qemu-thread-posix.c:133
> #4  0x000000000107720e in qemu_kvm_wait_io_event (cpu=0x282e09b0) at /root/qemu/cpus.c:1085
> #5  0x00000000010773aa in qemu_kvm_cpu_thread_fn (arg=0x282e09b0) at /root/qemu/cpus.c:1123
> #6  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #7  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> Thread 3 (Thread 0x3ff8a7ff910 (LWP 21145)):
> #0  0x000003ff8defd832 in ppoll () from target:/lib64/libc.so.6
> #1  0x000000000148b092 in qemu_poll_ns (fds=0x3ff840008c0, nfds=1, timeout=-1) at /root/qemu/util/qemu-timer.c:322
> #2  0x000000000148f46a in aio_poll (ctx=0x2827e7b0, blocking=true) at /root/qemu/util/aio-posix.c:622
> #3  0x00000000011a78f6 in iothread_run (opaque=0x2827d740) at /root/qemu/iothread.c:59
> #4  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #5  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> Thread 2 (Thread 0x3ff8afff910 (LWP 21142)):
> #0  0x000003ff8df04336 in syscall () from target:/lib64/libc.so.6
> #1  0x00000000014933ce in qemu_futex_wait (f=0x1b2b9b4 <rcu_call_ready_event>, val=4294967295) at /root/qemu/include/qemu/futex.h:26
> #2  0x00000000014935d0 in qemu_event_wait (ev=0x1b2b9b4 <rcu_call_ready_event>) at /root/qemu/util/qemu-thread-posix.c:399
> #3  0x00000000014b3e40 in call_rcu_thread (opaque=0x0) at /root/qemu/util/rcu.c:249
> #4  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
> #5  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6
> 
> 
> Looks like that this
>     <qemu:arg value='-drive'/>
>     <qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='virtio-blk-ccw,scsi=off,drive=null1,serial=null1,num-queues=4'/>
> 
> is necessary to trigger the hang, with nun-qeues=1 everything seems to work fine, so it might
> be a virtio issue after all?
> 
> Any idea?
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-04-24 10:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 10:31 [Qemu-devel] postcopy migration hangs while loading virtio state Christian Borntraeger
2017-04-24 10:53 ` Dr. David Alan Gilbert [this message]
2017-04-24 13:02   ` Christian Borntraeger
2017-04-24 14:35     ` Dr. David Alan Gilbert
2017-04-24 18:17       ` Christian Borntraeger
2017-04-24 19:12         ` Dr. David Alan Gilbert
2017-04-25 13:27           ` Christian Borntraeger
2017-04-26  9:59           ` Christian Borntraeger
2017-04-26 10:13             ` Christian Borntraeger
2017-04-26 11:01             ` Dr. David Alan Gilbert
2017-04-26 11:37               ` Christian Borntraeger
2017-04-26 11:45               ` Christian Borntraeger
2017-04-26 12:07                 ` Christian Borntraeger
2017-06-30 16:31                   ` Dr. David Alan Gilbert
2017-07-03 14:28                     ` Christian Borntraeger
2017-07-03 16:25                     ` Michael S. Tsirkin
2017-07-03 19:07                       ` Dr. David Alan Gilbert
2017-07-04  7:48                         ` Christian Borntraeger
2017-07-04  8:16                           ` Martin Schwidefsky
2017-07-04  8:27                             ` Christian Borntraeger
2017-07-10  8:05                             ` Christian Borntraeger
2017-04-24 13:11   ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170424105344.GF2362@work-vm \
    --to=dgilbert@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).