* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
[not found] <b384197e42e451311752aa03dba09fd86863f9c0.1316540355.git.jdenemar@redhat.com>
@ 2011-09-20 18:01 ` Eric Blake
2011-09-20 18:52 ` Anthony Liguori
` (2 more replies)
[not found] ` <20110920180649.GD4121@redhat.com>
1 sibling, 3 replies; 7+ messages in thread
From: Eric Blake @ 2011-09-20 18:01 UTC (permalink / raw)
To: libvir-list, QEMU Developers
On 09/20/2011 11:39 AM, Jiri Denemark wrote:
> The commit that prevents disk corruption on domain shutdown
> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
> only recently in QEMU git. With affected QEMU binaries, domains cannot
> be shutdown properly and stay in a paused state. This patch tries to
> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
> wait a bit more between sending SIGTERM and SIGKILL to reduce the
> possibility of virtual disk corruption.
> ---
> src/qemu/qemu_capabilities.c | 7 +++++++
> src/qemu/qemu_capabilities.h | 1 +
> src/qemu/qemu_process.c | 19 +++++++++++++------
> 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more
reliable indication of whether the qemu SIGTERM bug is fixed, so that we
don't corrupt data on a patched 0.14 or 0.15 qemu. That is, as part of
fixing the bug in qemu, we should also update -help text or something
similar, so that libvirt can avoid making decisions solely on version
numbers.
--
Eric Blake eblake@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
[not found] ` <20110920180649.GD4121@redhat.com>
@ 2011-09-20 18:19 ` Eric Blake
0 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2011-09-20 18:19 UTC (permalink / raw)
To: Dave Allan; +Cc: libvir-list, QEMU Developers
On 09/20/2011 12:06 PM, Dave Allan wrote:
> On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
>> The commit that prevents disk corruption on domain shutdown
>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>> be shutdown properly and stay in a paused state. This patch tries to
>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>> possibility of virtual disk corruption.
>
> IMO, SIGKILL should only be sent at the explicit direction of the
> user, saying in effect, I'm ok with possible data corruption, I want
> the VM killed unconditionally. I would rather leave VMs paused than
> risk corrupting data. Let's get as much input as we can from the qemu
> folks before we go down this path.
That re-echos my sentiment that qemu needs to tell us whether the bug is
fixed (we know that if version < 0.14, the bug is not present, and if
version > 0.15, the bug is fixed, but it is the 0.1[45] window where we
don't know if the vendor has back-ported the fix into the version of
qemu that we are targetting, unless we get some help from qemu).
I also wonder if we should make it so:
virDomainDestroy(dom) fails with a reasonable message, rather than
leaving the domain paused, if we think qemu has the bug, and require the
user to do virDomainDestroyFlags(dom, VIR_DOMAIN_DESTROY_FORCE) as the
means of the user explicitly requesting that they work around the qemu bug.
--
Eric Blake eblake@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
@ 2011-09-20 18:52 ` Anthony Liguori
2011-09-20 19:03 ` Eric Blake
2011-09-21 9:24 ` Daniel P. Berrange
2011-09-21 10:30 ` Kevin Wolf
2 siblings, 1 reply; 7+ messages in thread
From: Anthony Liguori @ 2011-09-20 18:52 UTC (permalink / raw)
To: Eric Blake; +Cc: libvir-list, QEMU Developers
On 09/20/2011 01:01 PM, Eric Blake wrote:
> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>> The commit that prevents disk corruption on domain shutdown
>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>> be shutdown properly and stay in a paused state. This patch tries to
>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>> possibility of virtual disk corruption.
>> ---
>> src/qemu/qemu_capabilities.c | 7 +++++++
>> src/qemu/qemu_capabilities.h | 1 +
>> src/qemu/qemu_process.c | 19 +++++++++++++------
>> 3 files changed, 21 insertions(+), 6 deletions(-)
>
> ACK. But it would be nice if upstream qemu could give us a more reliable
> indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt
> data on a patched 0.14 or 0.15 qemu.
Can you be a lot more specific about what bug you mean?
> That is, as part of fixing the bug in qemu,
> we should also update -help text or something similar, so that libvirt can avoid
> making decisions solely on version numbers.
The version number *is* the right way to make decisions. We've gone through
this dozens of times.
The fact that distros backport all sorts of stuff means that you need to
maintain a matrix of versions with features. It's not our (upstream QEMU's)
responsibility to tell you the differences that exist in forks of QEMU.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
2011-09-20 18:52 ` Anthony Liguori
@ 2011-09-20 19:03 ` Eric Blake
2011-09-20 20:12 ` Anthony Liguori
0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2011-09-20 19:03 UTC (permalink / raw)
To: Anthony Liguori; +Cc: libvir-list, QEMU Developers
On 09/20/2011 12:52 PM, Anthony Liguori wrote:
> On 09/20/2011 01:01 PM, Eric Blake wrote:
>> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>>> The commit that prevents disk corruption on domain shutdown
>>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>>> be shutdown properly and stay in a paused state. This patch tries to
>>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>>> possibility of virtual disk corruption.
>>> ---
>>> src/qemu/qemu_capabilities.c | 7 +++++++
>>> src/qemu/qemu_capabilities.h | 1 +
>>> src/qemu/qemu_process.c | 19 +++++++++++++------
>>> 3 files changed, 21 insertions(+), 6 deletions(-)
>>
>> ACK. But it would be nice if upstream qemu could give us a more reliable
>> indication of whether the qemu SIGTERM bug is fixed, so that we don't
>> corrupt
>> data on a patched 0.14 or 0.15 qemu.
>
> Can you be a lot more specific about what bug you mean?
>
https://bugzilla.redhat.com/show_bug.cgi?id=739895
>> That is, as part of fixing the bug in qemu,
>> we should also update -help text or something similar, so that libvirt
>> can avoid
>> making decisions solely on version numbers.
>
> The version number *is* the right way to make decisions. We've gone
> through this dozens of times.
>
> The fact that distros backport all sorts of stuff means that you need to
> maintain a matrix of versions with features. It's not our (upstream
> QEMU's) responsibility to tell you the differences that exist in forks
> of QEMU.
Version numbers are lousy, precisely because they are not granular
enough. That's why the autoconf philosophy frowns so heavily on version
checks, and prefers feature checks instead.
We want to know which features are present, not which versions
introduced which features. In this case, we want to know about a
particular feature (SIGTERM is not broken), which we know exists later
than 0.15, but which might also exist as a backport in 0.14 or 0.15. If
qemu tells us that information, then upstream libvirt can make the
decision correctly regardless of how distros backport the patch. But if
qemu does not expose the information, then upstream libvirt must be
pessimistic, and you've now forced the distros to do double-duty - they
must backport both the qemu fix, and write a distro-specific libvirt
patch that alters the version matrix to play with the distro build of qemu.
--
Eric Blake eblake@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
2011-09-20 19:03 ` Eric Blake
@ 2011-09-20 20:12 ` Anthony Liguori
0 siblings, 0 replies; 7+ messages in thread
From: Anthony Liguori @ 2011-09-20 20:12 UTC (permalink / raw)
To: Eric Blake; +Cc: libvir-list, QEMU Developers
On 09/20/2011 02:03 PM, Eric Blake wrote:
> On 09/20/2011 12:52 PM, Anthony Liguori wrote:
>> On 09/20/2011 01:01 PM, Eric Blake wrote:
>>> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>>>> The commit that prevents disk corruption on domain shutdown
>>>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>>>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>>>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>>>> be shutdown properly and stay in a paused state. This patch tries to
>>>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>>>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>>>> possibility of virtual disk corruption.
>>>> ---
>>>> src/qemu/qemu_capabilities.c | 7 +++++++
>>>> src/qemu/qemu_capabilities.h | 1 +
>>>> src/qemu/qemu_process.c | 19 +++++++++++++------
>>>> 3 files changed, 21 insertions(+), 6 deletions(-)
>>>
>>> ACK. But it would be nice if upstream qemu could give us a more reliable
>>> indication of whether the qemu SIGTERM bug is fixed, so that we don't
>>> corrupt
>>> data on a patched 0.14 or 0.15 qemu.
>>
>> Can you be a lot more specific about what bug you mean?
>>
>
> https://bugzilla.redhat.com/show_bug.cgi?id=739895
That just got applied, last week, so no, it's not in any release right now.
>
>>> That is, as part of fixing the bug in qemu,
>>> we should also update -help text or something similar, so that libvirt
>>> can avoid
>>> making decisions solely on version numbers.
>>
>> The version number *is* the right way to make decisions. We've gone
>> through this dozens of times.
>>
>> The fact that distros backport all sorts of stuff means that you need to
>> maintain a matrix of versions with features. It's not our (upstream
>> QEMU's) responsibility to tell you the differences that exist in forks
>> of QEMU.
>
> Version numbers are lousy, precisely because they are not granular enough.
> That's why the autoconf philosophy frowns so heavily on version checks, and
> prefers feature checks instead.
>
> We want to know which features are present,
Features and bugs are different things. I'm all for providing ways to detect
whether we support certain commands in QMP, command line options, etc.
not which versions introduced which
> features. In this case, we want to know about a particular feature (SIGTERM is
> not broken), which we know exists later than 0.15, but which might also exist as
> a backport in 0.14 or 0.15.
No, you want to know, does d9389b9664df561db796b18eb8309fffe58faf8b existing in
this build of QEMU. But makes d9389b more important than d296363 or db118fe72?
If you want to know whether a bug is fixed that is important to *you*, then you
should check the git log correlating to that version and embed that info in
libvirt. Then libvirt is entirely empowered to deem whatever bug fixes you
think are important to the table that you maintain.
> If qemu tells us that information, then upstream
> libvirt can make the decision correctly regardless of how distros backport the
> patch. But if qemu does not expose the information, then upstream libvirt must
> be pessimistic, and you've now forced the distros to do double-duty - they must
> backport both the qemu fix, and write a distro-specific libvirt patch that
> alters the version matrix to play with the distro build of qemu.
Or distros could use use the QEMU stable branch as their base and invest in
backporting to QEMU stable instead of maintaining private backport trees.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
2011-09-20 18:52 ` Anthony Liguori
@ 2011-09-21 9:24 ` Daniel P. Berrange
2011-09-21 10:30 ` Kevin Wolf
2 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrange @ 2011-09-21 9:24 UTC (permalink / raw)
To: Eric Blake; +Cc: libvir-list, QEMU Developers
On Tue, Sep 20, 2011 at 12:01:58PM -0600, Eric Blake wrote:
> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
> >The commit that prevents disk corruption on domain shutdown
> >(96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
> >0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
> >only recently in QEMU git. With affected QEMU binaries, domains cannot
> >be shutdown properly and stay in a paused state. This patch tries to
> >avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
> >wait a bit more between sending SIGTERM and SIGKILL to reduce the
> >possibility of virtual disk corruption.
> >---
> > src/qemu/qemu_capabilities.c | 7 +++++++
> > src/qemu/qemu_capabilities.h | 1 +
> > src/qemu/qemu_process.c | 19 +++++++++++++------
> > 3 files changed, 21 insertions(+), 6 deletions(-)
>
> ACK. But it would be nice if upstream qemu could give us a more
> reliable indication of whether the qemu SIGTERM bug is fixed, so
> that we don't corrupt data on a patched 0.14 or 0.15 qemu. That is,
> as part of fixing the bug in qemu, we should also update -help text
> or something similar, so that libvirt can avoid making decisions
> solely on version numbers.
-help is traditionally just a way to detecting features. I think it
would be rather an abuse of -help to start adding information about
every bugfix that affects mgmt apps too. So I think a version number
check is sufficient in this case.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
2011-09-20 18:52 ` Anthony Liguori
2011-09-21 9:24 ` Daniel P. Berrange
@ 2011-09-21 10:30 ` Kevin Wolf
2 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2011-09-21 10:30 UTC (permalink / raw)
To: Eric Blake; +Cc: libvir-list, Justin M. Forbes, QEMU Developers
Am 20.09.2011 20:01, schrieb Eric Blake:
> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>> The commit that prevents disk corruption on domain shutdown
>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>> be shutdown properly and stay in a paused state. This patch tries to
>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>> possibility of virtual disk corruption.
I really think libvirt should never SIGKILL qemu unless it's explicitly
told so. Management tools should try to ask the user before doing so.
Killing qemu with SIGKILL is never safe.
>> ---
>> src/qemu/qemu_capabilities.c | 7 +++++++
>> src/qemu/qemu_capabilities.h | 1 +
>> src/qemu/qemu_process.c | 19 +++++++++++++------
>> 3 files changed, 21 insertions(+), 6 deletions(-)
>
> ACK. But it would be nice if upstream qemu could give us a more
> reliable indication of whether the qemu SIGTERM bug is fixed, so that we
> don't corrupt data on a patched 0.14 or 0.15 qemu.
0.14 shouldn't have this bug, but it looks like 0.15 has it.
Justin, can you cherry-pick d9389b96 and 941f511a into stable-0.15 in
order to fix -no-shutdown?
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-09-21 10:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <b384197e42e451311752aa03dba09fd86863f9c0.1316540355.git.jdenemar@redhat.com>
2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
2011-09-20 18:52 ` Anthony Liguori
2011-09-20 19:03 ` Eric Blake
2011-09-20 20:12 ` Anthony Liguori
2011-09-21 9:24 ` Daniel P. Berrange
2011-09-21 10:30 ` Kevin Wolf
[not found] ` <20110920180649.GD4121@redhat.com>
2011-09-20 18:19 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).