From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Leonardo Bras <leobras@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: QEMU migration-test CI intermittent failure
Date: Thu, 14 Sep 2023 12:57:08 -0300 [thread overview]
Message-ID: <87bke4kasr.fsf@suse.de> (raw)
In-Reply-To: <ZQMoUzRH1BZKs39g@x1n>
Peter Xu <peterx@redhat.com> writes:
> On Thu, Sep 14, 2023 at 12:10:04PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Wed, Sep 13, 2023 at 04:42:31PM -0300, Fabiano Rosas wrote:
>> >> Stefan Hajnoczi <stefanha@redhat.com> writes:
>> >>
>> >> > Hi,
>> >> > The following intermittent failure occurred in the CI and I have filed
>> >> > an Issue for it:
>> >> > https://gitlab.com/qemu-project/qemu/-/issues/1886
>> >> >
>> >> > Output:
>> >> >
>> >> > >>> QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=116 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
>> >> > ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
>> >> > stderr:
>> >> > qemu-system-x86_64: Unable to read from socket: Connection reset by peer
>> >> > Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
>> >> > **
>> >> > ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
>> >> > (test program exited with status code -6)
>> >> >
>> >> > You can find the full output here:
>> >> > https://gitlab.com/qemu-project/qemu/-/jobs/5080200417
>> >>
>> >> This is the postcopy return path issue that I'm addressing here:
>> >>
>> >> https://lore.kernel.org/r/20230911171320.24372-1-farosas@suse.de
>> >> Subject: [PATCH v6 00/10] Fix segfault on migration return path
>> >> Message-ID: <20230911171320.24372-1-farosas@suse.de>
>> >
>> > Hmm I just noticed one thing, that Stefan's failure is a ram check issue
>> > only, which means qemu won't crash?
>> >
>>
>> The source could have crashed and left the migration at an inconsistent
>> state and then the destination saw corrupted memory?
>>
>> > Fabiano, are you sure it's the same issue on your return-path fix?
>> >
>>
>> I've been running the preempt tests on my branch for thousands of
>> iterations and didn't see any other errors. Since there's no code going
>> into the migration tree recently I assume it's the same error.
>>
>> I run the tests with GDB attached to QEMU, so I'll always see a crash
>> before any memory corruption.
>
> Okay, maybe that stops you from seeing the above check_guests_ram() error?
> Worth checking whether it fails differently always if you just don't attach
> gdb to it; I had a feeling that it'll always fail in the other way (I think
> migration-test will say something like "qemu killed" etc. in most cases),
> further to identify the issues.
>
>>
>> > I'm also trying to reproduce either of them with some loads. I think I hit
>> > some but it's very hard to reproduce solidly.
>>
>> Well, if you find anything else let me know and we'll fix it.
>
> I think Stefan's issue is the one I triggered once, but only once; I did
> see check_guests_ram() lines.
>
> I ran concurrently 10 migration-tests (on 8 cores; just to make scheduler
> start to really work), each looping over preempt/plain for 500 times and
> hit nothing.. I'm trying again with a larger host with more instances, so
> far I've run 200 loops over 40 instances running together, I hit
> nothing.. but I'm keeping trying.
I managed to reproduce it. It's not the return path error. In hindsight
that's obvious because that error happens in the 'recovery' test and this
one in the 'plain' one. Sorry about the noise.
This one reproduced with just 4 iterations of preempt/plain. I'll
investigate.
next prev parent reply other threads:[~2023-09-14 15:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-13 19:23 QEMU migration-test CI intermittent failure Stefan Hajnoczi
2023-09-13 19:42 ` Fabiano Rosas
2023-09-13 19:51 ` Stefan Hajnoczi
2023-09-14 14:56 ` Peter Xu
2023-09-14 15:10 ` Fabiano Rosas
2023-09-14 15:35 ` Peter Xu
2023-09-14 15:57 ` Fabiano Rosas [this message]
2023-09-14 16:39 ` Peter Xu
2023-09-14 21:13 ` Fabiano Rosas
2023-09-14 22:54 ` Fabiano Rosas
2023-09-14 23:27 ` Peter Xu
2023-09-15 1:56 ` Fabiano Rosas
2023-09-15 16:28 ` Peter Xu
2023-09-15 16:55 ` Peter Xu
2023-09-18 14:15 ` Fabiano Rosas
2023-09-18 15:35 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bke4kasr.fsf@suse.de \
--to=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).