All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Prasad Pandit <ppandit@redhat.com>, qemu-devel@nongnu.org
Cc: peterx@redhat.com, berrange@redhat.com,
	Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PATCH v8 0/7] Allow to enable multifd and postcopy migration together
Date: Mon, 31 Mar 2025 17:54:41 -0300	[thread overview]
Message-ID: <87plhwgbu6.fsf@suse.de> (raw)
In-Reply-To: <20250318123846.1370312-1-ppandit@redhat.com>

Prasad Pandit <ppandit@redhat.com> writes:

> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Hello,
>
> * This series (v8) splits earlier patch-2 which enabled multifd and
>   postcopy options together into two separate patches. One modifies
>   the channel discovery in migration_ioc_process_incoming() function,
>   and second one enables the multifd and postcopy migration together.
>
>   It also adds the 'save_postcopy_prepare' savevm_state handler to
>   enable different sections to take an action just before the Postcopy
>   phase starts. Thank you Peter for these patches.
> ===
> 67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             152.66s   81 subtests passed

The postcopy/multifd/plain test is still hanging from time to time. I
see a vmstate load function trying to access guest memory and the
postcopy-listen thread already finished, waiting for that
qemu_loadvm_state() (frame #18) to return and set the
main_thread_load_event.

Thread 1 (Thread 0x7fbc4849df80 (LWP 7487) "qemu-system-x86"):
#0  __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:274
#1  0x0000560b135103aa in flatview_read_continue_step (attrs=..., buf=0x560b168a5930 "U\252\022\006\016\a1\300\271", len=9216, mr_addr=831488, l=0x7fbc465ff980, mr=0x560b166c5070) at ../system/physmem.c:3056
#2  0x0000560b1351042e in flatview_read_continue (fv=0x560b16c606a0, addr=831488, attrs=..., ptr=0x560b168a5930, len=9216, mr_addr=831488, l=9216, mr=0x560b166c5070) at ../system/physmem.c:3073
#3  0x0000560b13510533 in flatview_read (fv=0x560b16c606a0, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3103
#4  0x0000560b135105be in address_space_read_full (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3116
#5  0x0000560b135106e7 in address_space_rw (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3144
#6  0x0000560b13510848 in cpu_physical_memory_rw (addr=831488, buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3170
#7  0x0000560b1338f5a5 in cpu_physical_memory_read (addr=831488, buf=0x560b168a5930, len=9216) at qemu/include/exec/cpu-common.h:148
#8  0x0000560b1339063c in patch_hypercalls (s=0x560b168840c0) at ../hw/i386/vapic.c:547
#9  0x0000560b1339096d in vapic_prepare (s=0x560b168840c0) at ../hw/i386/vapic.c:629
#10 0x0000560b13390e8b in vapic_post_load (opaque=0x560b168840c0, version_id=1) at ../hw/i386/vapic.c:789
#11 0x0000560b135b4924 in vmstate_load_state (f=0x560b16c53400, vmsd=0x560b147c6cc0 <vmstate_vapic>, opaque=0x560b168840c0, version_id=1) at ../migration/vmstate.c:234
#12 0x0000560b132a15b8 in vmstate_load (f=0x560b16c53400, se=0x560b16893390) at ../migration/savevm.c:972
#13 0x0000560b132a4f28 in qemu_loadvm_section_start_full (f=0x560b16c53400, type=4 '\004') at ../migration/savevm.c:2746
#14 0x0000560b132a5ae8 in qemu_loadvm_state_main (f=0x560b16c53400, mis=0x560b16877f20) at ../migration/savevm.c:3058
#15 0x0000560b132a45d0 in loadvm_handle_cmd_packaged (mis=0x560b16877f20) at ../migration/savevm.c:2451
#16 0x0000560b132a4b36 in loadvm_process_command (f=0x560b168c3b60) at ../migration/savevm.c:2614
#17 0x0000560b132a5b96 in qemu_loadvm_state_main (f=0x560b168c3b60, mis=0x560b16877f20) at ../migration/savevm.c:3073
#18 0x0000560b132a5db7 in qemu_loadvm_state (f=0x560b168c3b60) at ../migration/savevm.c:3150
#19 0x0000560b13286271 in process_incoming_migration_co (opaque=0x0) at ../migration/migration.c:892
#20 0x0000560b137cb6d4 in coroutine_trampoline (i0=377836416, i1=22027) at ../util/coroutine-ucontext.c:175
#21 0x00007fbc4786a79e in ??? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:103


Thread 10 (Thread 0x7fffce7fc700 (LWP 11778) "mig/dst/listen"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x000055555614e33f in qemu_futex_wait (f=0x5555576f6fc0, val=4294967295) at qemu/include/qemu/futex.h:29
#2  0x000055555614e505 in qemu_event_wait (ev=0x5555576f6fc0) at ../util/qemu-thread-posix.c:464
#3  0x0000555555c44eb1 in postcopy_ram_listen_thread (opaque=0x5555576f6f20) at ../migration/savevm.c:2135
#4  0x000055555614e6b8 in qemu_thread_start (args=0x5555582c8480) at ../util/qemu-thread-posix.c:541
#5  0x00007ffff72626ea in start_thread (arg=0x7fffce7fc700) at pthread_create.c:477
#6  0x00007ffff532158f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7fffceffd700 (LWP 11777) "mig/dst/fault"):
#0  0x00007ffff5314a89 in __GI___poll (fds=0x7fffc0000b60, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x0000555555c3be3f in postcopy_ram_fault_thread (opaque=0x5555576f6f20) at ../migration/postcopy-ram.c:999
#2  0x000055555614e6b8 in qemu_thread_start (args=0x555557735be0) at ../util/qemu-thread-posix.c:541
#3  0x00007ffff72626ea in start_thread (arg=0x7fffceffd700) at pthread_create.c:477
#4  0x00007ffff532158f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Breaking with gdb and stepping through the memcpy code generates a
request for a page that's seemingly already in the receivedmap:

(gdb) x/i $pc
=> 0x7ffff5399d14 <__memcpy_evex_unaligned_erms+86>:    rep movsb %ds:(%rsi),%es:(%rdi)
(gdb) p/x $rsi
$1 = 0x7fffd68cc000
(gdb) si
postcopy_ram_fault_thread_request Request for HVA=0x7fffd68cc000 rb=pc.ram offset=0xcc000 pid=11754
// these are my printfs:
postcopy_request_page:
migrate_send_rp_req_pages: 
migrate_send_rp_req_pages: mutex
migrate_send_rp_req_pages: received

// gdb hangs here, it looks like the page wasn't populated?

I've had my share of postcopy for the day. Hopefully you'll be able to
figure out what the issue is.

- reproducer (2nd iter already hangs for me):

$ for i in $(seq 1 9999); do echo "$i ============="; \
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test \
--full -r /x86_64/migration/postcopy/multifd/plain || break ; done

- reproducer with traces and gdb:

$ for i in $(seq 1 9999); do echo "$i ============="; \
QTEST_TRACE="multifd_* -trace source_* -trace postcopy_* -trace savevm_* \
-trace loadvm_*" QTEST_QEMU_BINARY_DST='gdb --ex "handle SIGUSR1 \
noprint" --ex "run" --args ./qemu-system-x86_64' \
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test \
--full -r /x86_64/migration/postcopy/multifd/plain || break ; done


  parent reply	other threads:[~2025-03-31 20:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-18 12:38 [PATCH v8 0/7] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-03-18 12:38 ` [PATCH v8 1/7] migration/multifd: move macros to multifd header Prasad Pandit
2025-03-18 12:38 ` [PATCH v8 2/7] migration: Refactor channel discovery mechanism Prasad Pandit
2025-03-31 15:01   ` Fabiano Rosas
2025-04-03  7:01     ` Prasad Pandit
2025-04-03 12:59       ` Fabiano Rosas
2025-04-04  9:48         ` Prasad Pandit
2025-03-18 12:38 ` [PATCH v8 3/7] migration: enable multifd and postcopy together Prasad Pandit
2025-03-31 15:27   ` Fabiano Rosas
2025-04-03 10:57     ` Prasad Pandit
2025-04-03 13:03       ` Fabiano Rosas
2025-03-18 12:38 ` [PATCH v8 4/7] tests/qtest/migration: consolidate set capabilities Prasad Pandit
2025-03-18 12:38 ` [PATCH v8 5/7] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
2025-03-18 12:38 ` [PATCH v8 6/7] migration: Add save_postcopy_prepare() savevm handler Prasad Pandit
2025-03-31 15:08   ` Fabiano Rosas
2025-04-03  7:03     ` Prasad Pandit
2025-03-18 12:38 ` [PATCH v8 7/7] migration/ram: Implement save_postcopy_prepare() Prasad Pandit
2025-03-31 15:18   ` Fabiano Rosas
2025-04-03  7:21     ` Prasad Pandit
2025-04-03 13:07       ` Fabiano Rosas
2025-04-04  9:50         ` Prasad Pandit
2025-03-25  9:53 ` [PATCH v8 0/7] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-03-27 14:35   ` Fabiano Rosas
2025-03-27 16:01     ` Prasad Pandit
2025-03-31 20:54 ` Fabiano Rosas [this message]
2025-04-03  7:24   ` Prasad Pandit
2025-04-03 13:11     ` Fabiano Rosas
2025-04-10 12:22       ` Prasad Pandit
2025-04-10 20:18         ` Fabiano Rosas
2025-04-11  7:25           ` Prasad Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87plhwgbu6.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=berrange@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pjp@fedoraproject.org \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.