* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression [not found] <b384197e42e451311752aa03dba09fd86863f9c0.1316540355.git.jdenemar@redhat.com> @ 2011-09-20 18:01 ` Eric Blake 2011-09-20 18:52 ` Anthony Liguori ` (2 more replies) [not found] ` <20110920180649.GD4121@redhat.com> 1 sibling, 3 replies; 7+ messages in thread From: Eric Blake @ 2011-09-20 18:01 UTC (permalink / raw) To: libvir-list, QEMU Developers On 09/20/2011 11:39 AM, Jiri Denemark wrote: > The commit that prevents disk corruption on domain shutdown > (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU > 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed > only recently in QEMU git. With affected QEMU binaries, domains cannot > be shutdown properly and stay in a paused state. This patch tries to > avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we > wait a bit more between sending SIGTERM and SIGKILL to reduce the > possibility of virtual disk corruption. > --- > src/qemu/qemu_capabilities.c | 7 +++++++ > src/qemu/qemu_capabilities.h | 1 + > src/qemu/qemu_process.c | 19 +++++++++++++------ > 3 files changed, 21 insertions(+), 6 deletions(-) ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu. That is, as part of fixing the bug in qemu, we should also update -help text or something similar, so that libvirt can avoid making decisions solely on version numbers. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression 2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake @ 2011-09-20 18:52 ` Anthony Liguori 2011-09-20 19:03 ` Eric Blake 2011-09-21 9:24 ` Daniel P. Berrange 2011-09-21 10:30 ` Kevin Wolf 2 siblings, 1 reply; 7+ messages in thread From: Anthony Liguori @ 2011-09-20 18:52 UTC (permalink / raw) To: Eric Blake; +Cc: libvir-list, QEMU Developers On 09/20/2011 01:01 PM, Eric Blake wrote: > On 09/20/2011 11:39 AM, Jiri Denemark wrote: >> The commit that prevents disk corruption on domain shutdown >> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU >> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed >> only recently in QEMU git. With affected QEMU binaries, domains cannot >> be shutdown properly and stay in a paused state. This patch tries to >> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we >> wait a bit more between sending SIGTERM and SIGKILL to reduce the >> possibility of virtual disk corruption. >> --- >> src/qemu/qemu_capabilities.c | 7 +++++++ >> src/qemu/qemu_capabilities.h | 1 + >> src/qemu/qemu_process.c | 19 +++++++++++++------ >> 3 files changed, 21 insertions(+), 6 deletions(-) > > ACK. But it would be nice if upstream qemu could give us a more reliable > indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt > data on a patched 0.14 or 0.15 qemu. Can you be a lot more specific about what bug you mean? > That is, as part of fixing the bug in qemu, > we should also update -help text or something similar, so that libvirt can avoid > making decisions solely on version numbers. The version number *is* the right way to make decisions. We've gone through this dozens of times. The fact that distros backport all sorts of stuff means that you need to maintain a matrix of versions with features. It's not our (upstream QEMU's) responsibility to tell you the differences that exist in forks of QEMU. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression 2011-09-20 18:52 ` Anthony Liguori @ 2011-09-20 19:03 ` Eric Blake 2011-09-20 20:12 ` Anthony Liguori 0 siblings, 1 reply; 7+ messages in thread From: Eric Blake @ 2011-09-20 19:03 UTC (permalink / raw) To: Anthony Liguori; +Cc: libvir-list, QEMU Developers On 09/20/2011 12:52 PM, Anthony Liguori wrote: > On 09/20/2011 01:01 PM, Eric Blake wrote: >> On 09/20/2011 11:39 AM, Jiri Denemark wrote: >>> The commit that prevents disk corruption on domain shutdown >>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU >>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed >>> only recently in QEMU git. With affected QEMU binaries, domains cannot >>> be shutdown properly and stay in a paused state. This patch tries to >>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we >>> wait a bit more between sending SIGTERM and SIGKILL to reduce the >>> possibility of virtual disk corruption. >>> --- >>> src/qemu/qemu_capabilities.c | 7 +++++++ >>> src/qemu/qemu_capabilities.h | 1 + >>> src/qemu/qemu_process.c | 19 +++++++++++++------ >>> 3 files changed, 21 insertions(+), 6 deletions(-) >> >> ACK. But it would be nice if upstream qemu could give us a more reliable >> indication of whether the qemu SIGTERM bug is fixed, so that we don't >> corrupt >> data on a patched 0.14 or 0.15 qemu. > > Can you be a lot more specific about what bug you mean? > https://bugzilla.redhat.com/show_bug.cgi?id=739895 >> That is, as part of fixing the bug in qemu, >> we should also update -help text or something similar, so that libvirt >> can avoid >> making decisions solely on version numbers. > > The version number *is* the right way to make decisions. We've gone > through this dozens of times. > > The fact that distros backport all sorts of stuff means that you need to > maintain a matrix of versions with features. It's not our (upstream > QEMU's) responsibility to tell you the differences that exist in forks > of QEMU. Version numbers are lousy, precisely because they are not granular enough. That's why the autoconf philosophy frowns so heavily on version checks, and prefers feature checks instead. We want to know which features are present, not which versions introduced which features. In this case, we want to know about a particular feature (SIGTERM is not broken), which we know exists later than 0.15, but which might also exist as a backport in 0.14 or 0.15. If qemu tells us that information, then upstream libvirt can make the decision correctly regardless of how distros backport the patch. But if qemu does not expose the information, then upstream libvirt must be pessimistic, and you've now forced the distros to do double-duty - they must backport both the qemu fix, and write a distro-specific libvirt patch that alters the version matrix to play with the distro build of qemu. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression 2011-09-20 19:03 ` Eric Blake @ 2011-09-20 20:12 ` Anthony Liguori 0 siblings, 0 replies; 7+ messages in thread From: Anthony Liguori @ 2011-09-20 20:12 UTC (permalink / raw) To: Eric Blake; +Cc: libvir-list, QEMU Developers On 09/20/2011 02:03 PM, Eric Blake wrote: > On 09/20/2011 12:52 PM, Anthony Liguori wrote: >> On 09/20/2011 01:01 PM, Eric Blake wrote: >>> On 09/20/2011 11:39 AM, Jiri Denemark wrote: >>>> The commit that prevents disk corruption on domain shutdown >>>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU >>>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed >>>> only recently in QEMU git. With affected QEMU binaries, domains cannot >>>> be shutdown properly and stay in a paused state. This patch tries to >>>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we >>>> wait a bit more between sending SIGTERM and SIGKILL to reduce the >>>> possibility of virtual disk corruption. >>>> --- >>>> src/qemu/qemu_capabilities.c | 7 +++++++ >>>> src/qemu/qemu_capabilities.h | 1 + >>>> src/qemu/qemu_process.c | 19 +++++++++++++------ >>>> 3 files changed, 21 insertions(+), 6 deletions(-) >>> >>> ACK. But it would be nice if upstream qemu could give us a more reliable >>> indication of whether the qemu SIGTERM bug is fixed, so that we don't >>> corrupt >>> data on a patched 0.14 or 0.15 qemu. >> >> Can you be a lot more specific about what bug you mean? >> > > https://bugzilla.redhat.com/show_bug.cgi?id=739895 That just got applied, last week, so no, it's not in any release right now. > >>> That is, as part of fixing the bug in qemu, >>> we should also update -help text or something similar, so that libvirt >>> can avoid >>> making decisions solely on version numbers. >> >> The version number *is* the right way to make decisions. We've gone >> through this dozens of times. >> >> The fact that distros backport all sorts of stuff means that you need to >> maintain a matrix of versions with features. It's not our (upstream >> QEMU's) responsibility to tell you the differences that exist in forks >> of QEMU. > > Version numbers are lousy, precisely because they are not granular enough. > That's why the autoconf philosophy frowns so heavily on version checks, and > prefers feature checks instead. > > We want to know which features are present, Features and bugs are different things. I'm all for providing ways to detect whether we support certain commands in QMP, command line options, etc. not which versions introduced which > features. In this case, we want to know about a particular feature (SIGTERM is > not broken), which we know exists later than 0.15, but which might also exist as > a backport in 0.14 or 0.15. No, you want to know, does d9389b9664df561db796b18eb8309fffe58faf8b existing in this build of QEMU. But makes d9389b more important than d296363 or db118fe72? If you want to know whether a bug is fixed that is important to *you*, then you should check the git log correlating to that version and embed that info in libvirt. Then libvirt is entirely empowered to deem whatever bug fixes you think are important to the table that you maintain. > If qemu tells us that information, then upstream > libvirt can make the decision correctly regardless of how distros backport the > patch. But if qemu does not expose the information, then upstream libvirt must > be pessimistic, and you've now forced the distros to do double-duty - they must > backport both the qemu fix, and write a distro-specific libvirt patch that > alters the version matrix to play with the distro build of qemu. Or distros could use use the QEMU stable branch as their base and invest in backporting to QEMU stable instead of maintaining private backport trees. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression 2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake 2011-09-20 18:52 ` Anthony Liguori @ 2011-09-21 9:24 ` Daniel P. Berrange 2011-09-21 10:30 ` Kevin Wolf 2 siblings, 0 replies; 7+ messages in thread From: Daniel P. Berrange @ 2011-09-21 9:24 UTC (permalink / raw) To: Eric Blake; +Cc: libvir-list, QEMU Developers On Tue, Sep 20, 2011 at 12:01:58PM -0600, Eric Blake wrote: > On 09/20/2011 11:39 AM, Jiri Denemark wrote: > >The commit that prevents disk corruption on domain shutdown > >(96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU > >0.14.* and 0.15.* because of a regression bug in QEMU that was fixed > >only recently in QEMU git. With affected QEMU binaries, domains cannot > >be shutdown properly and stay in a paused state. This patch tries to > >avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we > >wait a bit more between sending SIGTERM and SIGKILL to reduce the > >possibility of virtual disk corruption. > >--- > > src/qemu/qemu_capabilities.c | 7 +++++++ > > src/qemu/qemu_capabilities.h | 1 + > > src/qemu/qemu_process.c | 19 +++++++++++++------ > > 3 files changed, 21 insertions(+), 6 deletions(-) > > ACK. But it would be nice if upstream qemu could give us a more > reliable indication of whether the qemu SIGTERM bug is fixed, so > that we don't corrupt data on a patched 0.14 or 0.15 qemu. That is, > as part of fixing the bug in qemu, we should also update -help text > or something similar, so that libvirt can avoid making decisions > solely on version numbers. -help is traditionally just a way to detecting features. I think it would be rather an abuse of -help to start adding information about every bugfix that affects mgmt apps too. So I think a version number check is sufficient in this case. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression 2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake 2011-09-20 18:52 ` Anthony Liguori 2011-09-21 9:24 ` Daniel P. Berrange @ 2011-09-21 10:30 ` Kevin Wolf 2 siblings, 0 replies; 7+ messages in thread From: Kevin Wolf @ 2011-09-21 10:30 UTC (permalink / raw) To: Eric Blake; +Cc: libvir-list, Justin M. Forbes, QEMU Developers Am 20.09.2011 20:01, schrieb Eric Blake: > On 09/20/2011 11:39 AM, Jiri Denemark wrote: >> The commit that prevents disk corruption on domain shutdown >> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU >> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed >> only recently in QEMU git. With affected QEMU binaries, domains cannot >> be shutdown properly and stay in a paused state. This patch tries to >> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we >> wait a bit more between sending SIGTERM and SIGKILL to reduce the >> possibility of virtual disk corruption. I really think libvirt should never SIGKILL qemu unless it's explicitly told so. Management tools should try to ask the user before doing so. Killing qemu with SIGKILL is never safe. >> --- >> src/qemu/qemu_capabilities.c | 7 +++++++ >> src/qemu/qemu_capabilities.h | 1 + >> src/qemu/qemu_process.c | 19 +++++++++++++------ >> 3 files changed, 21 insertions(+), 6 deletions(-) > > ACK. But it would be nice if upstream qemu could give us a more > reliable indication of whether the qemu SIGTERM bug is fixed, so that we > don't corrupt data on a patched 0.14 or 0.15 qemu. 0.14 shouldn't have this bug, but it looks like 0.15 has it. Justin, can you cherry-pick d9389b96 and 941f511a into stable-0.15 in order to fix -no-shutdown? Kevin ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20110920180649.GD4121@redhat.com>]
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression [not found] ` <20110920180649.GD4121@redhat.com> @ 2011-09-20 18:19 ` Eric Blake 0 siblings, 0 replies; 7+ messages in thread From: Eric Blake @ 2011-09-20 18:19 UTC (permalink / raw) To: Dave Allan; +Cc: libvir-list, QEMU Developers On 09/20/2011 12:06 PM, Dave Allan wrote: > On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote: >> The commit that prevents disk corruption on domain shutdown >> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU >> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed >> only recently in QEMU git. With affected QEMU binaries, domains cannot >> be shutdown properly and stay in a paused state. This patch tries to >> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we >> wait a bit more between sending SIGTERM and SIGKILL to reduce the >> possibility of virtual disk corruption. > > IMO, SIGKILL should only be sent at the explicit direction of the > user, saying in effect, I'm ok with possible data corruption, I want > the VM killed unconditionally. I would rather leave VMs paused than > risk corrupting data. Let's get as much input as we can from the qemu > folks before we go down this path. That re-echos my sentiment that qemu needs to tell us whether the bug is fixed (we know that if version < 0.14, the bug is not present, and if version > 0.15, the bug is fixed, but it is the 0.1[45] window where we don't know if the vendor has back-ported the fix into the version of qemu that we are targetting, unless we get some help from qemu). I also wonder if we should make it so: virDomainDestroy(dom) fails with a reasonable message, rather than leaving the domain paused, if we think qemu has the bug, and require the user to do virDomainDestroyFlags(dom, VIR_DOMAIN_DESTROY_FORCE) as the means of the user explicitly requesting that they work around the qemu bug. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-09-21 10:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <b384197e42e451311752aa03dba09fd86863f9c0.1316540355.git.jdenemar@redhat.com> 2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake 2011-09-20 18:52 ` Anthony Liguori 2011-09-20 19:03 ` Eric Blake 2011-09-20 20:12 ` Anthony Liguori 2011-09-21 9:24 ` Daniel P. Berrange 2011-09-21 10:30 ` Kevin Wolf [not found] ` <20110920180649.GD4121@redhat.com> 2011-09-20 18:19 ` Eric Blake
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).