All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>, Peter Xu <peterx@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [PULL 00/17] Migration patches for 2024-12-17
Date: Fri, 03 Jan 2025 19:34:08 -0300	[thread overview]
Message-ID: <87pll37cin.fsf@suse.de> (raw)
In-Reply-To: <87sepz7guf.fsf@suse.de>

Fabiano Rosas <farosas@suse.de> writes:

> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> On Fri, 3 Jan 2025 at 13:32, Fabiano Rosas <farosas@suse.de> wrote:
>>>
>>> Thomas Huth <thuth@redhat.com> writes:
>>>
>>> > On 20/12/2024 17.28, Peter Xu wrote:
>>> >> On Thu, Dec 19, 2024 at 03:53:22PM -0300, Fabiano Rosas wrote:
>>> >>> Stefan Hajnoczi <stefanha@redhat.com> writes:
>>> >>>
>>> >>>> Hi Fabiano,
>>> >>>> Please take a look at this CI failure:
>>> >>>>
>>> >>>>>>> MALLOC_PERTURB_=61 QTEST_QEMU_BINARY=./qemu-system-s390x UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 QTEST_QEMU_IMG=./qemu-img MESON_TEST_ITERATION=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 PYTHON=/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/pyvenv/bin/python3 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh /home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
>>> >>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>> >>>> stderr:
>>> >>>> Traceback (most recent call last):
>>> >>>>    File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 688, in <module>
>>> >>>>      dump.read(dump_memory = args.memory)
>>> >>>>    File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 625, in read
>>> >>>>      section.read()
>>> >>>>    File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 461, in read
>>> >>>>      field['data'] = reader(field, self.file)
>>> >>>>    File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 434, in __init__
>>> >>>>      for field in self.desc['struct']['fields']:
>>> >>>> KeyError: 'fields'
>>> >>>
>>> >>> This is the command line that runs only this specific test:
>>> >>>
>>> >>> PYTHON=/usr/bin/python3.11 QTEST_QEMU_BINARY=./qemu-system-s390x
>>> >>> ./tests/qtest/migration-test -p /s390x/migration/analyze-script
>>> >>>
>>> >>> I cannot reproduce in migration-next nor in the detached HEAD that the
>>> >>> pipeline ran in (had to download the tarball from gitlab).
>>> >>>
>>> >>> The only s390 patch in this PR is one that I can test just fine with
>>> >>> TCG, so there shouldn't be any difference from KVM (i.e. there should be
>>> >>> no state being migrated with KVM that is not already migrated with TCG).
>>> >>>
>>> >>>> warning: fd: migration to a file is deprecated. Use file: instead.
>>> >>>> warning: fd: migration to a file is deprecated. Use file: instead.
>>> >>>
>>> >>> This is harmless.
>>> >>>
>>> >>>> **
>>> >>>> ERROR:../tests/qtest/migration-test.c:36:main: assertion failed (ret == 0): (1 == 0)
>>> >>>> (test program exited with status code -6)
>>> >>>
>>> >>> This is the assert at the end of the tests, irrelevant.
>>> >>>
>>> >>>>
>>> >>>> https://gitlab.com/qemu-project/qemu/-/jobs/8681858344#L8190
>>> >>>>
>>> >>>> If you find this pull request caused the failure, please send a new
>>> >>>> revision. Otherwise please let me know so we can continue to
>>> >>>> investigate.
>>> >>>
>>> >>> I don't have an s390x host at hand so the only thing I can to is to drop
>>> >>> that patch and hope that resolves the problem. @Peter, @Thomas, any
>>> >>> other ideas? Can you verify this on your end?
>>> >>
>>> >> Cannot reproduce either here, x86_64 host only.  The report was from s390
>>> >> host, though.  I'm not familiar with the s390 patch, I wonder if any of you
>>> >> could use plain brain power to figure more things out.
>>> >>
>>> >> We could wait for 1-2 more days to see whether Thomas can figure it out,
>>> >> hopefully easily reproduceable on s390.. or we can also leave that for
>>> >> later.  And if the current issue on such fix is s390-host-only, might be
>>> >> easier to be picked up by s390 tree, perhaps?
>>> >
>>> > I tested migration-20241217-pull-request on a s390x (RHEL) host, but I
>>> > cannot reproduce the issue there - make check-qtest works without any
>>> > problems. Is it maybe related to that specific Ubuntu installation?
>>> >
>>>
>>> Since we cannot reproduce outside of the staging CI, could we run that
>>> job again with a diagnostic patch? Here's the rebased PR with the patch:
>>>
>>> https://gitlab.com/farosas/qemu/-/commits/migration-next
>>>
>>> (fork CI run: https://gitlab.com/farosas/qemu/-/pipelines/1610691202)
>>>
>>> Or should I just send a v2 of this PR with the debug patch?
>>
>> Here is the staging CI pipeline for your migration-next tree:
>> https://gitlab.com/qemu-project/qemu/-/pipelines/1610836485
>
> Great, thanks! Let's find out what is going on...
>

It seems the issue is here:

{"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1}
                                                              ^
And in QEMU:

static const VMStateDescription vmstate_css = {
    .name = "s390_css",
    ...
->      VMSTATE_ARRAY_OF_POINTER_TO_STRUCT(css, ChannelSubSys, MAX_CSSID + 1,
                0, vmstate_css_img, CssImage),

Is it legal to have an empty array? I would assume so. Are we maybe
missing a .needed?

Comparing with another similar vmstate spapr_llan/rx_pools in ppc
(-device spapr-vlan), what I see is:

{"name": "rx_pool", "array_len": 5, "type": "struct", "struct":
{"vmsd_name": "spapr_llan/rx_buffer_pool", ... }, "size": 32776}

So for CSS I'd expect:

-{"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1}
+{"name": "css", "array_len": 256, "type": "struct", "struct": {"vmsd_name": "s390_css_img", ...}, "size": 1}

What is weird is that in my TCG run it also shows the empty struct and
the script doesn't seem to care. For some reason, in the CI job it
parses further into the JSON.

If anyone spots something, let me know. I'll get back to this on Monday
with a fresh mind.


  reply	other threads:[~2025-01-03 22:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17 17:48 [PULL 00/17] Migration patches for 2024-12-17 Fabiano Rosas
2024-12-17 17:48 ` [PULL 01/17] migration/multifd: Fix compile error caused by page_size usage Fabiano Rosas
2024-12-17 17:48 ` [PULL 02/17] migration/multifd: Further remove the SYNC on complete Fabiano Rosas
2024-12-17 17:48 ` [PULL 03/17] migration/multifd: Allow to sync with sender threads only Fabiano Rosas
2024-12-17 17:48 ` [PULL 04/17] migration/ram: Move RAM_SAVE_FLAG* into ram.h Fabiano Rosas
2024-12-17 17:48 ` [PULL 05/17] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages Fabiano Rosas
2024-12-17 17:48 ` [PULL 06/17] migration/multifd: Remove sync processing on postcopy Fabiano Rosas
2024-12-17 17:48 ` [PULL 07/17] migration/multifd: Cleanup src flushes on condition check Fabiano Rosas
2024-12-17 17:48 ` [PULL 08/17] migration/multifd: Document the reason to sync for save_setup() Fabiano Rosas
2024-12-17 17:48 ` [PULL 09/17] migration/multifd: Fix compat with QEMU < 9.0 Fabiano Rosas
2024-12-17 17:48 ` [PULL 10/17] s390x: Fix CSS migration Fabiano Rosas
2024-12-17 17:48 ` [PULL 11/17] migration: Add helper to get target runstate Fabiano Rosas
2024-12-17 17:48 ` [PULL 12/17] qmp/cont: Only activate disks if migration completed Fabiano Rosas
2024-12-17 17:48 ` [PULL 13/17] migration/block: Make late-block-active the default Fabiano Rosas
2024-12-17 17:48 ` [PULL 14/17] migration/block: Apply late-block-active behavior to postcopy Fabiano Rosas
2024-12-17 17:48 ` [PULL 15/17] migration/block: Fix possible race with block_inactive Fabiano Rosas
2024-12-17 17:48 ` [PULL 16/17] migration/block: Rewrite disk activation Fabiano Rosas
2024-12-17 17:48 ` [PULL 17/17] tests/qtest/migration: Fix compile errors when CONFIG_UADK is set Fabiano Rosas
2024-12-19 12:32 ` [PULL 00/17] Migration patches for 2024-12-17 Stefan Hajnoczi
2024-12-19 18:53   ` Fabiano Rosas
2024-12-20 16:28     ` Peter Xu
2025-01-02  9:32       ` Thomas Huth
2025-01-03 18:30         ` Fabiano Rosas
2025-01-03 20:31           ` Stefan Hajnoczi
2025-01-03 21:00             ` Fabiano Rosas
2025-01-03 22:34               ` Fabiano Rosas [this message]
2025-01-06 18:45                 ` Peter Xu
2025-01-06 19:24                   ` Fabiano Rosas
2025-01-06 20:22                     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pll37cin.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.