From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Lukas Straub <lukasstraub2@web.de>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
qemu-block <qemu-block@nongnu.org>,
"Juan Quintela" <quintela@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
"Max Reitz" <mreitz@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu
Date: Tue, 12 May 2020 10:09:09 +0100 [thread overview]
Message-ID: <20200512090909.GB2802@work-vm> (raw)
In-Reply-To: <20200511201218.76f30e0b@luklap>
* Lukas Straub (lukasstraub2@web.de) wrote:
> On Mon, 11 May 2020 12:49:47 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.
> >
> > If qemu as a whole hangs due to a stalled network connection, that is a
> > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > I/O in general, such that if the network connection or remote server stalls,
> > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > loop.
> >
> > There are places in QEMU code which are not well behaved in this respect,
> > but many are, and others are getting fixed where found to be important.
> >
> > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > event of a stalled network should be considered a security flaw, because
> > the network is untrusted in general.
>
> The fact that out-of-band qmp commands exist at all shows that we have to make tradeoffs of developer time vs. doing things right. Sure, the migration code can be rewritten to use non-blocking i/o and finegrained locks. But as a hobbyist I don't have time to fix this.
If it was just an hobbyist vs fulltime thing then I'd say that was a bad
way to make the decision; however the reality is that even those who are
paid to watch this code don't have the feeling we can make it fail
quickly/non-blocking - and for COLO you need to be absolutely sure you
nail every case, so you'd some how have to audit the whole lot, and keep
watching it.
(and thank you for taking your time to do this!)
Dave
> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.
> >
> > IIUC, invoking the "yank" command unconditionally kills every single
> > network connection in QEMU that has registered with the "yank" subsystem.
> > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > QEMU not handling stalled networking well.
> >
> > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > connection used for the guest disk, we needlessly break I/O.
>
> Yeah, these patches are intended to solve the problems with the colo use-case where all external connections (migration, chardevs, nbd) are just for replication. In other use-cases you'd enable the yank feature only on the non-essential connections.
>
> > eg doing this in the chardev backend is not desirable, because the bugs
> > with hanging QEMU are typically caused by the way the frontend device
> > uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> >
> >
> > Regards,
> > Daniel
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2020-05-12 9:10 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-11 11:14 [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-05-11 11:14 ` [PATCH 1/5] Introduce yank feature Lukas Straub
2020-05-11 11:14 ` [PATCH 2/5] io/channel.c,io/channel-socket.c: Add " Lukas Straub
2020-05-11 11:51 ` Daniel P. Berrangé
2020-05-11 20:00 ` Lukas Straub
2020-05-12 8:52 ` Daniel P. Berrangé
2020-05-11 11:14 ` [PATCH 3/5] block/nbd.c: " Lukas Straub
2020-05-11 16:19 ` Dr. David Alan Gilbert
2020-05-11 17:05 ` Lukas Straub
2020-05-11 17:24 ` Dr. David Alan Gilbert
2020-05-12 8:54 ` Daniel P. Berrangé
2020-05-15 9:48 ` Lukas Straub
2020-05-15 10:04 ` Daniel P. Berrangé
2020-05-15 10:14 ` Lukas Straub
2020-05-15 10:26 ` Daniel P. Berrangé
2020-05-15 13:03 ` Lukas Straub
2020-05-15 13:45 ` Daniel P. Berrangé
2020-05-11 11:14 ` [PATCH 4/5] chardev/char-socket.c: " Lukas Straub
2020-05-12 8:56 ` Daniel P. Berrangé
2020-05-11 11:14 ` [PATCH 5/5] migration: " Lukas Straub
2020-05-11 11:49 ` [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu Daniel P. Berrangé
2020-05-11 12:07 ` Dr. David Alan Gilbert
2020-05-11 12:17 ` Daniel P. Berrangé
2020-05-11 15:46 ` Dr. David Alan Gilbert
2020-05-12 9:32 ` Lukas Straub
2020-05-12 9:43 ` Daniel P. Berrangé
2020-05-12 18:58 ` Dr. David Alan Gilbert
2020-05-13 8:41 ` Daniel P. Berrangé
2020-05-12 19:42 ` Lukas Straub
2020-05-13 10:32 ` Kevin Wolf
2020-05-13 10:53 ` Dr. David Alan Gilbert
2020-05-13 11:13 ` Kevin Wolf
2020-05-13 11:26 ` Daniel P. Berrangé
2020-05-13 11:58 ` Paolo Bonzini
2020-05-13 12:25 ` Dr. David Alan Gilbert
2020-05-13 12:32 ` Kevin Wolf
2020-05-13 12:18 ` Kevin Wolf
2020-05-13 12:56 ` Dr. David Alan Gilbert
2020-05-13 13:08 ` Daniel P. Berrangé
2020-05-13 13:48 ` Dr. David Alan Gilbert
2020-05-13 13:57 ` Eric Blake
2020-05-13 14:06 ` Kevin Wolf
2020-05-13 14:18 ` Eric Blake
2020-09-01 10:35 ` Markus Armbruster
2020-05-11 19:41 ` Lukas Straub
2020-05-11 18:12 ` Lukas Straub
2020-05-12 9:03 ` Daniel P. Berrangé
2020-05-12 9:06 ` Dr. David Alan Gilbert
2020-05-12 9:09 ` Dr. David Alan Gilbert [this message]
2020-05-13 19:12 ` Lukas Straub
2020-05-14 10:41 ` Kevin Wolf
2020-05-14 11:01 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200512090909.GB2802@work-vm \
--to=dgilbert@redhat.com \
--cc=berrange@redhat.com \
--cc=kwolf@redhat.com \
--cc=lukasstraub2@web.de \
--cc=marcandre.lureau@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).