From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Leonardo Bras <leobras@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: QEMU migration-test CI intermittent failure
Date: Thu, 14 Sep 2023 11:35:47 -0400 [thread overview]
Message-ID: <ZQMoUzRH1BZKs39g@x1n> (raw)
In-Reply-To: <87edj0kcz7.fsf@suse.de>
On Thu, Sep 14, 2023 at 12:10:04PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Wed, Sep 13, 2023 at 04:42:31PM -0300, Fabiano Rosas wrote:
> >> Stefan Hajnoczi <stefanha@redhat.com> writes:
> >>
> >> > Hi,
> >> > The following intermittent failure occurred in the CI and I have filed
> >> > an Issue for it:
> >> > https://gitlab.com/qemu-project/qemu/-/issues/1886
> >> >
> >> > Output:
> >> >
> >> > >>> QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=116 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
> >> > ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> >> > stderr:
> >> > qemu-system-x86_64: Unable to read from socket: Connection reset by peer
> >> > Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
> >> > **
> >> > ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
> >> > (test program exited with status code -6)
> >> >
> >> > You can find the full output here:
> >> > https://gitlab.com/qemu-project/qemu/-/jobs/5080200417
> >>
> >> This is the postcopy return path issue that I'm addressing here:
> >>
> >> https://lore.kernel.org/r/20230911171320.24372-1-farosas@suse.de
> >> Subject: [PATCH v6 00/10] Fix segfault on migration return path
> >> Message-ID: <20230911171320.24372-1-farosas@suse.de>
> >
> > Hmm I just noticed one thing, that Stefan's failure is a ram check issue
> > only, which means qemu won't crash?
> >
>
> The source could have crashed and left the migration at an inconsistent
> state and then the destination saw corrupted memory?
>
> > Fabiano, are you sure it's the same issue on your return-path fix?
> >
>
> I've been running the preempt tests on my branch for thousands of
> iterations and didn't see any other errors. Since there's no code going
> into the migration tree recently I assume it's the same error.
>
> I run the tests with GDB attached to QEMU, so I'll always see a crash
> before any memory corruption.
Okay, maybe that stops you from seeing the above check_guests_ram() error?
Worth checking whether it fails differently always if you just don't attach
gdb to it; I had a feeling that it'll always fail in the other way (I think
migration-test will say something like "qemu killed" etc. in most cases),
further to identify the issues.
>
> > I'm also trying to reproduce either of them with some loads. I think I hit
> > some but it's very hard to reproduce solidly.
>
> Well, if you find anything else let me know and we'll fix it.
I think Stefan's issue is the one I triggered once, but only once; I did
see check_guests_ram() lines.
I ran concurrently 10 migration-tests (on 8 cores; just to make scheduler
start to really work), each looping over preempt/plain for 500 times and
hit nothing.. I'm trying again with a larger host with more instances, so
far I've run 200 loops over 40 instances running together, I hit
nothing.. but I'm keeping trying.
--
Peter Xu
next prev parent reply other threads:[~2023-09-14 15:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-13 19:23 QEMU migration-test CI intermittent failure Stefan Hajnoczi
2023-09-13 19:42 ` Fabiano Rosas
2023-09-13 19:51 ` Stefan Hajnoczi
2023-09-14 14:56 ` Peter Xu
2023-09-14 15:10 ` Fabiano Rosas
2023-09-14 15:35 ` Peter Xu [this message]
2023-09-14 15:57 ` Fabiano Rosas
2023-09-14 16:39 ` Peter Xu
2023-09-14 21:13 ` Fabiano Rosas
2023-09-14 22:54 ` Fabiano Rosas
2023-09-14 23:27 ` Peter Xu
2023-09-15 1:56 ` Fabiano Rosas
2023-09-15 16:28 ` Peter Xu
2023-09-15 16:55 ` Peter Xu
2023-09-18 14:15 ` Fabiano Rosas
2023-09-18 15:35 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQMoUzRH1BZKs39g@x1n \
--to=peterx@redhat.com \
--cc=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).