From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lan, Tianyu" Subject: Re: live migration vs device assignment (motivation) Date: Thu, 10 Dec 2015 22:23:56 +0800 Message-ID: <56698AFC.7060907@intel.com> References: <1448372127-28115-1-git-send-email-tianyu.lan@intel.com> <20151207165039.GA20210@redhat.com> <56685631.50700@intel.com> <20151209215334-mutt-send-email-mst@redhat.com> <5668EBD6.9080506@intel.com> <20151210095213-mutt-send-email-mst@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: aik@ozlabs.ru, alex.williamson@redhat.com, amit.shah@redhat.com, anthony@codemonkey.ws, ard.biesheuvel@linaro.org, blauwirbel@gmail.com, cornelia.huck@de.ibm.com, eddie.dong@intel.com, nrupal.jani@intel.com, agraf@suse.de, kvm@vger.kernel.org, pbonzini@redhat.com, qemu-devel@nongnu.org, emil.s.tantilov@intel.com, gerlitz.or@gmail.com, donald.c.skidmore@intel.com, mark.d.rustad@intel.com, kraxel@redhat.com, lcapitulino@redhat.com, quintela@redhat.com To: "Michael S. Tsirkin" Return-path: Received: from mga03.intel.com ([134.134.136.65]:25463 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752213AbbLJOYH (ORCPT ); Thu, 10 Dec 2015 09:24:07 -0500 In-Reply-To: <20151210095213-mutt-send-email-mst@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 12/10/2015 4:38 PM, Michael S. Tsirkin wrote: > Let's assume you do save state and do have a way to detect > whether state matches a given hardware. For example, > driver could store firmware and hardware versions > in the state, and then on destination, retrieve them > and compare. It will be pretty common that you have a mismatch, > and you must not just fail migration. You need a way to recover, > maybe with more downtime. > > > Second, you can change the driver but you can not be sure it will have > the chance to run at all. Host overload is a common reason to migrate > out of the host. You also can not trust guest to do the right thing. > So how long do you want to wait until you decide guest is not > cooperating and kill it? Most people will probably experiment a bit and > then add a bit of a buffer. This is not robust at all. > > Again, maybe you ask driver to save state, and if it does > not respond for a while, then you still migrate, > and driver has to recover on destination. > > > With the above in mind, you need to support two paths: > 1. "good path": driver stores state on source, checks it on destination > detects a match and restores state into the device > 2. "bad path": driver does not store state, or detects a mismatch > on destination. driver has to assume device was lost, > and reset it > > So what I am saying is, implement bad path first. Then good path > is an optimization - measure whether it's faster, and by how much. > These sound reasonable. Driver should have ability to do such check to ensure hardware or firmware coherence after migration and reset device when migration happens at some unexpected position. > Also, it would be nice if on the bad path there was a way > to switch to another driver entirely, even if that means > a bit more downtime. For example, have a way for driver to > tell Linux it has to re-do probing for the device. Just glace the code of device core. device_reprobe() does what you said. /** * device_reprobe - remove driver for a device and probe for a new driver * @dev: the device to reprobe * * This function detaches the attached driver (if any) for the given * device and restarts the driver probing process. It is intended * to use if probing criteria changed during a devices lifetime and * driver attachment should change accordingly. */ int device_reprobe(struct device *dev)