qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: recent flakiness (intermittent hangs) of migration-test
Date: Thu, 29 Oct 2020 16:28:10 -0400	[thread overview]
Message-ID: <20201029202810.GD455015@xz-x1> (raw)
In-Reply-To: <20201029193433.GE3335@work-vm>

On Thu, Oct 29, 2020 at 07:34:33PM +0000, Dr. David Alan Gilbert wrote:
> > Here's qemu process 3514:
> > Thread 5 (Thread 0x3ff4affd910 (LWP 3628)):
> > #0  0x000003ff94c8d936 in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa26cd74dc)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ff94c8d936 in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa26cd7488, cond=0x2aa26cd74b0)
> >     at pthread_cond_wait.c:502
> > #2  0x000003ff94c8d936 in __pthread_cond_wait
> > (cond=cond@entry=0x2aa26cd74b0, mutex=mutex@entry=0x2aa26cd7488)
> >     at pthread_cond_wait.c:655
> > #3  0x000002aa2497072c in qemu_sem_wait (sem=sem@entry=0x2aa26cd7488)
> > at ../../util/qemu-thread-posix.c:328
> > #4  0x000002aa244f4a02 in postcopy_pause (s=0x2aa26cd7000) at
> > ../../migration/migration.c:3192

So the postcopy pause state didn't continue successfully on src due to some
reason ...

> > #5  0x000002aa244f4a02 in migration_detect_error (s=0x2aa26cd7000) at
> > ../../migration/migration.c:3255
> > #6  0x000002aa244f4a02 in migration_thread
> > (opaque=opaque@entry=0x2aa26cd7000) at
> > ../../migration/migration.c:3564
> > #7  0x000002aa2496fa3a in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:521
> > #8  0x000003ff94c87aa8 in start_thread (arg=0x3ff4affd910) at
> > pthread_create.c:463
> > #9  0x000003ff94b7a896 in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

[...]

> > And here's 3528:
> > Thread 6 (Thread 0x3ff6ccfd910 (LWP 3841)):
> > #0  0x000003ffb1b8d936 in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa387a6aac)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ffb1b8d936 in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa387a6a58, cond=0x2aa387a6a80)
> >     at pthread_cond_wait.c:502
> > #2  0x000003ffb1b8d936 in __pthread_cond_wait
> > (cond=cond@entry=0x2aa387a6a80, mutex=mutex@entry=0x2aa387a6a58)
> >     at pthread_cond_wait.c:655
> > #3  0x000002aa36bf072c in qemu_sem_wait (sem=sem@entry=0x2aa387a6a58)
> > at ../../util/qemu-thread-posix.c:328
> > #4  0x000002aa366c369a in postcopy_pause_incoming (mis=<optimized
> > out>) at ../../migration/savevm.c:2541

Same on the destination side.

> > #5  0x000002aa366c369a in qemu_loadvm_state_main
> > (f=f@entry=0x2aa38897930, mis=mis@entry=0x2aa387a6820)
> >     at ../../migration/savevm.c:2615
> > #6  0x000002aa366c44fa in postcopy_ram_listen_thread
> > (opaque=opaque@entry=0x0) at ../../migration/savevm.c:1830
> > #7  0x000002aa36befa3a in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:521
> > #8  0x000003ffb1b87aa8 in start_thread (arg=0x3ff6ccfd910) at
> > pthread_create.c:463
> > #9  0x000003ffb1a7a896 in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Peter, is it possible that you enable QTEST_LOG=1 in your future migration-test
testcase and try to capture the stderr?  With the help of commit a47295014d
("migration-test: Only hide error if !QTEST_LOG", 2020-10-26), the test should
be able to dump quite some helpful information to further identify the issue.

I'll also try to find another s390 host to try reproduce on my side.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2020-10-29 20:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29 17:20 recent flakiness (intermittent hangs) of migration-test Peter Maydell
2020-10-29 17:41 ` Dr. David Alan Gilbert
2020-10-29 18:55   ` Peter Maydell
2020-10-29 19:34     ` Dr. David Alan Gilbert
2020-10-29 20:28       ` Peter Xu [this message]
2020-10-30 11:48         ` Peter Maydell
2020-10-30 13:53           ` Peter Xu
2020-11-02 13:55             ` Philippe Mathieu-Daudé
2020-11-02 14:19               ` Christian Schoenebeck
2020-11-02 15:14                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201029202810.GD455015@xz-x1 \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).