From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:51807) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R65bK-0004vf-1I for qemu-devel@nongnu.org; Tue, 20 Sep 2011 15:03:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R65bI-0005Th-Ca for qemu-devel@nongnu.org; Tue, 20 Sep 2011 15:03:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48296) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R65bI-0005Ta-1l for qemu-devel@nongnu.org; Tue, 20 Sep 2011 15:03:24 -0400 Message-ID: <4E78E374.4010104@redhat.com> Date: Tue, 20 Sep 2011 13:03:16 -0600 From: Eric Blake MIME-Version: 1.0 References: <4E78D516.7080803@redhat.com> <4E78E0DC.2050303@codemonkey.ws> In-Reply-To: <4E78E0DC.2050303@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [libvirt] [PATCH] qemu: Fix shutdown regression List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: libvir-list@redhat.com, QEMU Developers On 09/20/2011 12:52 PM, Anthony Liguori wrote: > On 09/20/2011 01:01 PM, Eric Blake wrote: >> On 09/20/2011 11:39 AM, Jiri Denemark wrote: >>> The commit that prevents disk corruption on domain shutdown >>> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU >>> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed >>> only recently in QEMU git. With affected QEMU binaries, domains cannot >>> be shutdown properly and stay in a paused state. This patch tries to >>> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we >>> wait a bit more between sending SIGTERM and SIGKILL to reduce the >>> possibility of virtual disk corruption. >>> --- >>> src/qemu/qemu_capabilities.c | 7 +++++++ >>> src/qemu/qemu_capabilities.h | 1 + >>> src/qemu/qemu_process.c | 19 +++++++++++++------ >>> 3 files changed, 21 insertions(+), 6 deletions(-) >> >> ACK. But it would be nice if upstream qemu could give us a more reliable >> indication of whether the qemu SIGTERM bug is fixed, so that we don't >> corrupt >> data on a patched 0.14 or 0.15 qemu. > > Can you be a lot more specific about what bug you mean? > https://bugzilla.redhat.com/show_bug.cgi?id=739895 >> That is, as part of fixing the bug in qemu, >> we should also update -help text or something similar, so that libvirt >> can avoid >> making decisions solely on version numbers. > > The version number *is* the right way to make decisions. We've gone > through this dozens of times. > > The fact that distros backport all sorts of stuff means that you need to > maintain a matrix of versions with features. It's not our (upstream > QEMU's) responsibility to tell you the differences that exist in forks > of QEMU. Version numbers are lousy, precisely because they are not granular enough. That's why the autoconf philosophy frowns so heavily on version checks, and prefers feature checks instead. We want to know which features are present, not which versions introduced which features. In this case, we want to know about a particular feature (SIGTERM is not broken), which we know exists later than 0.15, but which might also exist as a backport in 0.14 or 0.15. If qemu tells us that information, then upstream libvirt can make the decision correctly regardless of how distros backport the patch. But if qemu does not expose the information, then upstream libvirt must be pessimistic, and you've now forced the distros to do double-duty - they must backport both the qemu fix, and write a distro-specific libvirt patch that alters the version matrix to play with the distro build of qemu. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org