From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39928) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xt4Ir-00064l-TW for qemu-devel@nongnu.org; Mon, 24 Nov 2014 19:48:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xt4Il-0001UU-Of for qemu-devel@nongnu.org; Mon, 24 Nov 2014 19:48:25 -0500 From: Don Slutz Message-ID: <5473D1CE.7040102@terremark.com> Date: Mon, 24 Nov 2014 19:48:14 -0500 MIME-Version: 1.0 References: <1416259239-13281-1-git-send-email-dslutz@verizon.com> <87k32r8jc8.fsf@blackfin.pond.sub.org> <546E339E.4030202@terremark.com> <87fvdd6j29.fsf@blackfin.pond.sub.org> <20141121104945.GH3165@work-vm> In-Reply-To: <20141121104945.GH3165@work-vm> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Kevin Wolf , Stefano Stabellini , qemu-stable@nongnu.org, qemu-devel@nongnu.org, Don Slutz , Markus Armbruster , Stefan Hajnoczi On 11/21/14 05:49, Dr. David Alan Gilbert wrote: > * Markus Armbruster (armbru@redhat.com) wrote: >> Don Slutz writes: >> >>> On 11/19/14 07:29, Markus Armbruster wrote: >>>> Don Slutz writes: >>>> >>>>> The other callers to blk_set_enable_write_cache() in this file >>>>> already check for s->blk == NULL. >>>>> >>>>> Signed-off-by: Don Slutz >>>>> --- >>>>> >>>>> I think this is a bugfix that should be back ported to stable >>>>> releases. >>>>> >>>>> I also think this should be done in xen's copy of QEMU for 4.5 with >>>>> back port(s) to active stable releases. >>>>> >>>>> Note: In 2.1 and earlier the routine is >>>>> bdrv_set_enable_write_cache(); variable is s->bs. >>>> Got a reproducer? >>> yes. Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on >>> CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined. >>> >>> >>>> I'm asking because I believe s->identify_set implies s->blk. >>>> s->identify_set is initialized to zero, and gets set to non-zero exactly >>>> on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in >>>> ide_identify(), ide_atapi_identify() or ide_cfata_identify(), >>>> respectively. Only called via cmd_identify() / cmd_identify_packet() >>>> via ide_exec_cmd(). The latter immediately fails when !s->blk: >>>> >>>> s = idebus_active_if(bus); >>>> /* ignore commands to non existent slave */ >>>> if (s != bus->ifs && !s->blk) { >>>> return; >>>> } >>> I do think that you are right. I have now spent more time on why I am >>> seeing this. >>> >>> >>>> Even if I'm right, your patch is fine, because it makes this spot more >>>> obviously correct, and consistent with the other uses of >>>> blk_set_enable_write_cache(). The case for stable is weak, though. >>>> >>> I had not fully tracked down what is happening before sending the bugfix. >>> I have now done more debugging, and have tracked it down to xen 4.4 >>> now using "-nodefaults" with QEMU. >>> >>> I needed to add output to QEMU to track this down because I have long >>> command lines... >>> >>> (all I get for ps -ef): >> [...] >>> Which is missing that option. >>> >>> The ide that was aborting in this case is the cdrom at hdc that is added >>> if you do not specify "-nodefaults". >>> >>> Since this is a "changed" machine config, I am no longer as sure as what >>> versions this needs to be in. >>> >>> If I put my QEMU hat on, it does not look like a back port is needed. >>> However >>> for xen it would be nice. >>> >>> I do not know how the QEMU community feels about migration from a config >>> without "-nodefaults" to one with "-nodefaults" as the only difference. >> So you have a CD-ROM on the source, but not on the destination? Yes, QEMU in the source has an empty CD-ROM, but not on the destination. xen does not know that QEMU added a CD-ROM drive in hdc by default. >> That can't work. I guess it broke for you in an unusual way (target >> crashes) rather than the usual way (target rejects migration data for a >> device it doesn't have) due to our convoluted IDE data structures. With >> your patch applied it should break the usual way. Does it? Nope. It does not break at all. The migration "works". The target accepts the hdc data. It looks like to me that all 4 IDE state data is sent. >> Management tools should use -nodefaults. But if it mixes default and >> -nodefaults in migration, recreating the stuff it got by default but >> doesn't get with -nodefaults is its own responsibility. Yes. Since xen did not ask for a CD-ROM, it did not expect to need to create it. xen was "fixed" to use -nodefaults. > Well, mostly - we wouldn't expect a migration to work if the source/dest > didn't match exactly; but QEMU shouldn't seg. Well, this change prevents a seg. So you are in favor of having this backported? -Don Slutz > Dave > > -- > Dr. David Alan Gilbert /dgilbert@redhat.com / Manchester, UK