qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org
Cc: "Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>
Subject: Re: recent flakiness (intermittent hangs) of migration-test
Date: Mon, 02 Nov 2020 15:19:50 +0100	[thread overview]
Message-ID: <2235778.NHLJzTTKgb@silver> (raw)
In-Reply-To: <f3a379ca-5c7b-0c19-b0ea-6354c460eff3@redhat.com>

On Montag, 2. November 2020 14:55:04 CET Philippe Mathieu-Daudé wrote:
> On 10/30/20 2:53 PM, Peter Xu wrote:
> > On Fri, Oct 30, 2020 at 11:48:28AM +0000, Peter Maydell wrote:
> >>> Peter, is it possible that you enable QTEST_LOG=1 in your future
> >>> migration-test testcase and try to capture the stderr?  With the help
> >>> of commit a47295014d ("migration-test: Only hide error if !QTEST_LOG",
> >>> 2020-10-26), the test should be able to dump quite some helpful
> >>> information to further identify the issue.>> 
> >> Here's the result of running just the migration test with
> >> QTEST_LOG=1:
> >> https://people.linaro.org/~peter.maydell/migration.log
> >> It's 300MB because when the test hangs one of the processes
> >> is apparently in a polling state and continues to send status
> >> queries.
> >> 
> >> My impression is that the test is OK on an unloaded machine but
> >> more likely to fail if the box is doing other things at the
> >> same time. Alternatively it might be a 'parallel make check' bug.
> > 
> > Thanks for collecting that, Peter.
> > 
> > I'm copy-pasting the important information out here (with some moves and
> > indents to make things even clearer):
> > 
> > ...
> > {"execute": "migrate-recover", "arguments": {"uri":
> > "unix:/tmp/migration-test-nGzu4q/migsocket-recover"}, "id":
> > "recover-cmd"} {"timestamp": {"seconds": 1604056292, "microseconds":
> > 177955}, "event": "MIGRATION", "data": {"status": "setup"}} {"return":
> > {}, "id": "recover-cmd"}
> > {"execute": "query-migrate"}
> > ...
> > {"execute": "migrate", "arguments": {"resume": true, "uri":
> > "unix:/tmp/migration-test-nGzu4q/migsocket-recover"}} qemu-system-x86_64:
> > ram_save_queue_pages no previous block
> > qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> > {"return": {}}
> > {"execute": "migrate-set-parameters", "arguments":
> > {"max-postcopy-bandwidth": 0}} ...
> > 
> > The problem is probably an misuse on last_rb on destination node.  When
> > looking at it, I also found a race.  So I guess I should fix both...
> > 
> > Peter, would it be easy to try apply the two patches I attached to see
> > whether the test hang would be resolved?  Dave, feel free to give early
> > comments too on the two fixes before I post them on the list.
> 
> Per this comment:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg756235.html
> You could add:
> Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>

Yes, you can do that.

We've extensively tested with Peter Xu's patches in the last couple days on 
various systems and haven't encountered any further lockup since then.

Best regards,
Christian Schoenebeck




  reply	other threads:[~2020-11-02 14:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29 17:20 recent flakiness (intermittent hangs) of migration-test Peter Maydell
2020-10-29 17:41 ` Dr. David Alan Gilbert
2020-10-29 18:55   ` Peter Maydell
2020-10-29 19:34     ` Dr. David Alan Gilbert
2020-10-29 20:28       ` Peter Xu
2020-10-30 11:48         ` Peter Maydell
2020-10-30 13:53           ` Peter Xu
2020-11-02 13:55             ` Philippe Mathieu-Daudé
2020-11-02 14:19               ` Christian Schoenebeck [this message]
2020-11-02 15:14                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2235778.NHLJzTTKgb@silver \
    --to=qemu_oss@crudebyte.com \
    --cc=dgilbert@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).