qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Prasad Pandit <ppandit@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	qemu-devel@nongnu.org, berrange@redhat.com,
	Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together
Date: Tue, 29 Apr 2025 11:49:35 -0400	[thread overview]
Message-ID: <aBD1D3obdWFta-1H@x1.local> (raw)
In-Reply-To: <CAE8KmOxR1EoyLK6+49bVJK9BW0NfhgEcE2_aVxQQjkBY9y1xwA@mail.gmail.com>

On Tue, Apr 29, 2025 at 08:50:19PM +0530, Prasad Pandit wrote:
> On Tue, 29 Apr 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote:
> > Please don't rush to send. Again, let's verify the issue first before
> > resending anything.
> >
> > If you could reproduce it it would be perfect, then we can already verify
> > it.  Otherwise we may need help from Fabiano.  Let's not send anything if
> > you're not yet sure whether it works..  It can confuse people thinking
> > problem solved, but maybe not yet.
> 
> * No, the migration hang issue is not reproducing on my side. Earlier
> in this thread, Fabiano said you'll be better able to confirm the
> issue. (so its possible fix as well I guess)
> 
> * You don't have access to the set-up that he uses for running tests
> and merging patches? Would it be possible for you to run the same
> tests? (just checking, I don't know how co-maintainers work to
> test/merge patches)

No I don't.

> 
> * If we don't send the patch, how will Fabiano test it? Should we wait
> for Fabiano to come back and then make this same patch in his set-up
> and test/verify it?

I thought you've provided a diff.  That would be good enough for
verifications.  If you really want, you can repost, but please mention
explicitly that you haven't verified the issue, so the patchset needs to be
verified.

Fabiano should come back early May.  If you want, you can try to look into
how to reproduce it by looking at why it triggered in vapic path:

https://lore.kernel.org/all/87plhwgbu6.fsf@suse.de/#t

Thread 1 (Thread 0x7fbc4849df80 (LWP 7487) "qemu-system-x86"):
#0  __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:274
#1  0x0000560b135103aa in flatview_read_continue_step (attrs=..., buf=0x560b168a5930 "U\252\022\006\016\a1\300\271", len=9216, mr_addr=831488, l=0x7fbc465ff980, mr=0x560b166c5070) at ../system/physmem.c:3056
#2  0x0000560b1351042e in flatview_read_continue (fv=0x560b16c606a0, addr=831488, attrs=..., ptr=0x560b168a5930, len=9216, mr_addr=831488, l=9216, mr=0x560b166c5070) at ../system/physmem.c:3073
#3  0x0000560b13510533 in flatview_read (fv=0x560b16c606a0, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3103
#4  0x0000560b135105be in address_space_read_full (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3116
#5  0x0000560b135106e7 in address_space_rw (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3144
#6  0x0000560b13510848 in cpu_physical_memory_rw (addr=831488, buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3170
#7  0x0000560b1338f5a5 in cpu_physical_memory_read (addr=831488, buf=0x560b168a5930, len=9216) at qemu/include/exec/cpu-common.h:148
#8  0x0000560b1339063c in patch_hypercalls (s=0x560b168840c0) at ../hw/i386/vapic.c:547
#9  0x0000560b1339096d in vapic_prepare (s=0x560b168840c0) at ../hw/i386/vapic.c:629
#10 0x0000560b13390e8b in vapic_post_load (opaque=0x560b168840c0, version_id=1) at ../hw/i386/vapic.c:789
#11 0x0000560b135b4924 in vmstate_load_state (f=0x560b16c53400, vmsd=0x560b147c6cc0 <vmstate_vapic>, opaque=0x560b168840c0, version_id=1) at ../migration/vmstate.c:234
#12 0x0000560b132a15b8 in vmstate_load (f=0x560b16c53400, se=0x560b16893390) at ../migration/savevm.c:972
#13 0x0000560b132a4f28 in qemu_loadvm_section_start_full (f=0x560b16c53400, type=4 '\004') at ../migration/savevm.c:2746
#14 0x0000560b132a5ae8 in qemu_loadvm_state_main (f=0x560b16c53400, mis=0x560b16877f20) at ../migration/savevm.c:3058
#15 0x0000560b132a45d0 in loadvm_handle_cmd_packaged (mis=0x560b16877f20) at ../migration/savevm.c:2451
#16 0x0000560b132a4b36 in loadvm_process_command (f=0x560b168c3b60) at ../migration/savevm.c:2614
#17 0x0000560b132a5b96 in qemu_loadvm_state_main (f=0x560b168c3b60, mis=0x560b16877f20) at ../migration/savevm.c:3073
#18 0x0000560b132a5db7 in qemu_loadvm_state (f=0x560b168c3b60) at ../migration/savevm.c:3150
#19 0x0000560b13286271 in process_incoming_migration_co (opaque=0x0) at ../migration/migration.c:892
#20 0x0000560b137cb6d4 in coroutine_trampoline (i0=377836416, i1=22027) at ../util/coroutine-ucontext.c:175
#21 0x00007fbc4786a79e in ??? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:103

So _if_ the theory is correct, vapic's patch_hypercalls() might be reading
a zero page (with GPA 831488, over len=9216, which IIUC covers three
pages).  Maybe you can check when it'll be one zero page and when it will
be not, then maybe you can figure out how you make it always a zero page
hence reliably trigger a hang in post_load.

You could also try to write a program in guest, zeroing most pages first,
trigger migrate (hence send zero pages during multifd precopy), start
postcopy, then you should be able to observe vcpu hang at least before
postcopy completes.  However I don't think it'll hang forever, since if
migration all completes, UFFDIO_UNREGISTER will remove the userfaultfd
trackings and then kick all hang threads out, causing the fault to be
resolved right at the completion of postcopy.  So it won't really hang
forever like what Fabiano reported here.  Meanwhile we'll always want to
verify the original reproducer.. even if you could hang it temporarily in a
vcpu thread.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-04-29 15:50 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 11:45 [PATCH v9 0/7] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-04-11 11:45 ` [PATCH v9 1/7] migration/multifd: move macros to multifd header Prasad Pandit
2025-04-11 11:45 ` [PATCH v9 2/7] migration: refactor channel discovery mechanism Prasad Pandit
2025-04-17 16:07   ` Fabiano Rosas
2025-04-11 11:45 ` [PATCH v9 3/7] migration: Add save_postcopy_prepare() savevm handler Prasad Pandit
2025-04-17 16:07   ` Fabiano Rosas
2025-04-11 11:45 ` [PATCH v9 4/7] migration/ram: Implement save_postcopy_prepare() Prasad Pandit
2025-04-17 16:08   ` Fabiano Rosas
2025-04-11 11:45 ` [PATCH v9 5/7] migration: enable multifd and postcopy together Prasad Pandit
2025-04-11 11:45 ` [PATCH v9 6/7] tests/qtest/migration: consolidate set capabilities Prasad Pandit
2025-04-17 16:11   ` Fabiano Rosas
2025-04-11 11:45 ` [PATCH v9 7/7] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
2025-04-17 16:10   ` Fabiano Rosas
2025-04-16  0:31 ` [PATCH v9 0/7] Allow to enable multifd and postcopy migration together Fabiano Rosas
2025-04-16 12:59   ` Fabiano Rosas
2025-04-17 11:13     ` Prasad Pandit
2025-04-17 16:05       ` Fabiano Rosas
2025-04-23 22:50         ` Peter Xu
2025-04-29 12:51           ` Prasad Pandit
2025-04-29 13:04             ` Peter Xu
2025-04-29 13:28               ` Prasad Pandit
2025-04-29 13:47                 ` Peter Xu
2025-04-29 15:20                   ` Prasad Pandit
2025-04-29 15:49                     ` Peter Xu [this message]
2025-05-05 19:01                 ` Fabiano Rosas
2025-05-06 12:32                   ` Prasad Pandit
2025-05-05 19:04             ` Fabiano Rosas
2025-05-06 12:38               ` Prasad Pandit
2025-05-06 13:40                 ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBD1D3obdWFta-1H@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=pjp@fedoraproject.org \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).