qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
       [not found] <b384197e42e451311752aa03dba09fd86863f9c0.1316540355.git.jdenemar@redhat.com>
@ 2011-09-20 18:01 ` Eric Blake
  2011-09-20 18:52   ` Anthony Liguori
                     ` (2 more replies)
       [not found] ` <20110920180649.GD4121@redhat.com>
  1 sibling, 3 replies; 7+ messages in thread
From: Eric Blake @ 2011-09-20 18:01 UTC (permalink / raw)
  To: libvir-list, QEMU Developers

On 09/20/2011 11:39 AM, Jiri Denemark wrote:
> The commit that prevents disk corruption on domain shutdown
> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
> only recently in QEMU git. With affected QEMU binaries, domains cannot
> be shutdown properly and stay in a paused state. This patch tries to
> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
> wait a bit more between sending SIGTERM and SIGKILL to reduce the
> possibility of virtual disk corruption.
> ---
>   src/qemu/qemu_capabilities.c |    7 +++++++
>   src/qemu/qemu_capabilities.h |    1 +
>   src/qemu/qemu_process.c      |   19 +++++++++++++------
>   3 files changed, 21 insertions(+), 6 deletions(-)

ACK.  But it would be nice if upstream qemu could give us a more 
reliable indication of whether the qemu SIGTERM bug is fixed, so that we 
don't corrupt data on a patched 0.14 or 0.15 qemu.  That is, as part of 
fixing the bug in qemu, we should also update -help text or something 
similar, so that libvirt can avoid making decisions solely on version 
numbers.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
       [not found] ` <20110920180649.GD4121@redhat.com>
@ 2011-09-20 18:19   ` Eric Blake
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2011-09-20 18:19 UTC (permalink / raw)
  To: Dave Allan; +Cc: libvir-list, QEMU Developers

On 09/20/2011 12:06 PM, Dave Allan wrote:
> On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
>> The commit that prevents disk corruption on domain shutdown
>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>> be shutdown properly and stay in a paused state. This patch tries to
>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>> possibility of virtual disk corruption.
>
> IMO, SIGKILL should only be sent at the explicit direction of the
> user, saying in effect, I'm ok with possible data corruption, I want
> the VM killed unconditionally.  I would rather leave VMs paused than
> risk corrupting data.  Let's get as much input as we can from the qemu
> folks before we go down this path.

That re-echos my sentiment that qemu needs to tell us whether the bug is 
fixed (we know that if version < 0.14, the bug is not present, and if 
version > 0.15, the bug is fixed, but it is the 0.1[45] window where we 
don't know if the vendor has back-ported the fix into the version of 
qemu that we are targetting, unless we get some help from qemu).

I also wonder if we should make it so:

virDomainDestroy(dom) fails with a reasonable message, rather than 
leaving the domain paused, if we think qemu has the bug, and require the 
user to do virDomainDestroyFlags(dom, VIR_DOMAIN_DESTROY_FORCE) as the 
means of the user explicitly requesting that they work around the qemu bug.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
  2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
@ 2011-09-20 18:52   ` Anthony Liguori
  2011-09-20 19:03     ` Eric Blake
  2011-09-21  9:24   ` Daniel P. Berrange
  2011-09-21 10:30   ` Kevin Wolf
  2 siblings, 1 reply; 7+ messages in thread
From: Anthony Liguori @ 2011-09-20 18:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, QEMU Developers

On 09/20/2011 01:01 PM, Eric Blake wrote:
> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>> The commit that prevents disk corruption on domain shutdown
>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>> be shutdown properly and stay in a paused state. This patch tries to
>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>> possibility of virtual disk corruption.
>> ---
>> src/qemu/qemu_capabilities.c | 7 +++++++
>> src/qemu/qemu_capabilities.h | 1 +
>> src/qemu/qemu_process.c | 19 +++++++++++++------
>> 3 files changed, 21 insertions(+), 6 deletions(-)
>
> ACK. But it would be nice if upstream qemu could give us a more reliable
> indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt
> data on a patched 0.14 or 0.15 qemu.

Can you be a lot more specific about what bug you mean?

> That is, as part of fixing the bug in qemu,
> we should also update -help text or something similar, so that libvirt can avoid
> making decisions solely on version numbers.

The version number *is* the right way to make decisions.  We've gone through 
this dozens of times.

The fact that distros backport all sorts of stuff means that you need to 
maintain a matrix of versions with features.  It's not our (upstream QEMU's) 
responsibility to tell you the differences that exist in forks of QEMU.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
  2011-09-20 18:52   ` Anthony Liguori
@ 2011-09-20 19:03     ` Eric Blake
  2011-09-20 20:12       ` Anthony Liguori
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2011-09-20 19:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: libvir-list, QEMU Developers

On 09/20/2011 12:52 PM, Anthony Liguori wrote:
> On 09/20/2011 01:01 PM, Eric Blake wrote:
>> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>>> The commit that prevents disk corruption on domain shutdown
>>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>>> be shutdown properly and stay in a paused state. This patch tries to
>>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>>> possibility of virtual disk corruption.
>>> ---
>>> src/qemu/qemu_capabilities.c | 7 +++++++
>>> src/qemu/qemu_capabilities.h | 1 +
>>> src/qemu/qemu_process.c | 19 +++++++++++++------
>>> 3 files changed, 21 insertions(+), 6 deletions(-)
>>
>> ACK. But it would be nice if upstream qemu could give us a more reliable
>> indication of whether the qemu SIGTERM bug is fixed, so that we don't
>> corrupt
>> data on a patched 0.14 or 0.15 qemu.
>
> Can you be a lot more specific about what bug you mean?
>

https://bugzilla.redhat.com/show_bug.cgi?id=739895

>> That is, as part of fixing the bug in qemu,
>> we should also update -help text or something similar, so that libvirt
>> can avoid
>> making decisions solely on version numbers.
>
> The version number *is* the right way to make decisions. We've gone
> through this dozens of times.
>
> The fact that distros backport all sorts of stuff means that you need to
> maintain a matrix of versions with features. It's not our (upstream
> QEMU's) responsibility to tell you the differences that exist in forks
> of QEMU.

Version numbers are lousy, precisely because they are not granular 
enough.  That's why the autoconf philosophy frowns so heavily on version 
checks, and prefers feature checks instead.

We want to know which features are present, not which versions 
introduced which features.  In this case, we want to know about a 
particular feature (SIGTERM is not broken), which we know exists later 
than 0.15, but which might also exist as a backport in 0.14 or 0.15.  If 
qemu tells us that information, then upstream libvirt can make the 
decision correctly regardless of how distros backport the patch.  But if 
qemu does not expose the information, then upstream libvirt must be 
pessimistic, and you've now forced the distros to do double-duty - they 
must backport both the qemu fix, and write a distro-specific libvirt 
patch that alters the version matrix to play with the distro build of qemu.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
  2011-09-20 19:03     ` Eric Blake
@ 2011-09-20 20:12       ` Anthony Liguori
  0 siblings, 0 replies; 7+ messages in thread
From: Anthony Liguori @ 2011-09-20 20:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, QEMU Developers

On 09/20/2011 02:03 PM, Eric Blake wrote:
> On 09/20/2011 12:52 PM, Anthony Liguori wrote:
>> On 09/20/2011 01:01 PM, Eric Blake wrote:
>>> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>>>> The commit that prevents disk corruption on domain shutdown
>>>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>>>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>>>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>>>> be shutdown properly and stay in a paused state. This patch tries to
>>>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>>>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>>>> possibility of virtual disk corruption.
>>>> ---
>>>> src/qemu/qemu_capabilities.c | 7 +++++++
>>>> src/qemu/qemu_capabilities.h | 1 +
>>>> src/qemu/qemu_process.c | 19 +++++++++++++------
>>>> 3 files changed, 21 insertions(+), 6 deletions(-)
>>>
>>> ACK. But it would be nice if upstream qemu could give us a more reliable
>>> indication of whether the qemu SIGTERM bug is fixed, so that we don't
>>> corrupt
>>> data on a patched 0.14 or 0.15 qemu.
>>
>> Can you be a lot more specific about what bug you mean?
>>
>
> https://bugzilla.redhat.com/show_bug.cgi?id=739895

That just got applied, last week, so no, it's not in any release right now.

>
>>> That is, as part of fixing the bug in qemu,
>>> we should also update -help text or something similar, so that libvirt
>>> can avoid
>>> making decisions solely on version numbers.
>>
>> The version number *is* the right way to make decisions. We've gone
>> through this dozens of times.
>>
>> The fact that distros backport all sorts of stuff means that you need to
>> maintain a matrix of versions with features. It's not our (upstream
>> QEMU's) responsibility to tell you the differences that exist in forks
>> of QEMU.
>
> Version numbers are lousy, precisely because they are not granular enough.
> That's why the autoconf philosophy frowns so heavily on version checks, and
> prefers feature checks instead.
>
> We want to know which features are present,

Features and bugs are different things.  I'm all for providing ways to detect 
whether we support certain commands in QMP, command line options, etc.

  not which versions introduced which
> features. In this case, we want to know about a particular feature (SIGTERM is
> not broken), which we know exists later than 0.15, but which might also exist as
> a backport in 0.14 or 0.15.

No, you want to know, does d9389b9664df561db796b18eb8309fffe58faf8b existing in 
this build of QEMU.  But makes d9389b more important than d296363 or db118fe72?

If you want to know whether a bug is fixed that is important to *you*, then you 
should check the git log correlating to that version and embed that info in 
libvirt.  Then libvirt is entirely empowered to deem whatever bug fixes you 
think are important to the table that you maintain.

> If qemu tells us that information, then upstream
> libvirt can make the decision correctly regardless of how distros backport the
> patch. But if qemu does not expose the information, then upstream libvirt must
> be pessimistic, and you've now forced the distros to do double-duty - they must
> backport both the qemu fix, and write a distro-specific libvirt patch that
> alters the version matrix to play with the distro build of qemu.

Or distros could use use the QEMU stable branch as their base and invest in 
backporting to QEMU stable instead of maintaining private backport trees.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
  2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
  2011-09-20 18:52   ` Anthony Liguori
@ 2011-09-21  9:24   ` Daniel P. Berrange
  2011-09-21 10:30   ` Kevin Wolf
  2 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrange @ 2011-09-21  9:24 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, QEMU Developers

On Tue, Sep 20, 2011 at 12:01:58PM -0600, Eric Blake wrote:
> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
> >The commit that prevents disk corruption on domain shutdown
> >(96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
> >0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
> >only recently in QEMU git. With affected QEMU binaries, domains cannot
> >be shutdown properly and stay in a paused state. This patch tries to
> >avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
> >wait a bit more between sending SIGTERM and SIGKILL to reduce the
> >possibility of virtual disk corruption.
> >---
> >  src/qemu/qemu_capabilities.c |    7 +++++++
> >  src/qemu/qemu_capabilities.h |    1 +
> >  src/qemu/qemu_process.c      |   19 +++++++++++++------
> >  3 files changed, 21 insertions(+), 6 deletions(-)
> 
> ACK.  But it would be nice if upstream qemu could give us a more
> reliable indication of whether the qemu SIGTERM bug is fixed, so
> that we don't corrupt data on a patched 0.14 or 0.15 qemu.  That is,
> as part of fixing the bug in qemu, we should also update -help text
> or something similar, so that libvirt can avoid making decisions
> solely on version numbers.

-help is traditionally just a way to detecting features. I think it
would be rather an abuse of -help to start adding information about
every bugfix that affects mgmt apps too. So I think a version number
check is sufficient in this case.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression
  2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
  2011-09-20 18:52   ` Anthony Liguori
  2011-09-21  9:24   ` Daniel P. Berrange
@ 2011-09-21 10:30   ` Kevin Wolf
  2 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2011-09-21 10:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, Justin M. Forbes, QEMU Developers

Am 20.09.2011 20:01, schrieb Eric Blake:
> On 09/20/2011 11:39 AM, Jiri Denemark wrote:
>> The commit that prevents disk corruption on domain shutdown
>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
>> only recently in QEMU git. With affected QEMU binaries, domains cannot
>> be shutdown properly and stay in a paused state. This patch tries to
>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
>> wait a bit more between sending SIGTERM and SIGKILL to reduce the
>> possibility of virtual disk corruption.

I really think libvirt should never SIGKILL qemu unless it's explicitly
told so. Management tools should try to ask the user before doing so.
Killing qemu with SIGKILL is never safe.

>> ---
>>   src/qemu/qemu_capabilities.c |    7 +++++++
>>   src/qemu/qemu_capabilities.h |    1 +
>>   src/qemu/qemu_process.c      |   19 +++++++++++++------
>>   3 files changed, 21 insertions(+), 6 deletions(-)
> 
> ACK.  But it would be nice if upstream qemu could give us a more 
> reliable indication of whether the qemu SIGTERM bug is fixed, so that we 
> don't corrupt data on a patched 0.14 or 0.15 qemu. 

0.14 shouldn't have this bug, but it looks like 0.15 has it.

Justin, can you cherry-pick d9389b96 and 941f511a into stable-0.15 in
order to fix -no-shutdown?

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-21 10:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <b384197e42e451311752aa03dba09fd86863f9c0.1316540355.git.jdenemar@redhat.com>
2011-09-20 18:01 ` [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression Eric Blake
2011-09-20 18:52   ` Anthony Liguori
2011-09-20 19:03     ` Eric Blake
2011-09-20 20:12       ` Anthony Liguori
2011-09-21  9:24   ` Daniel P. Berrange
2011-09-21 10:30   ` Kevin Wolf
     [not found] ` <20110920180649.GD4121@redhat.com>
2011-09-20 18:19   ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).