From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: live migration vs device assignment (motivation) Date: Tue, 29 Dec 2015 19:15:23 +0200 Message-ID: <20151229191142-mutt-send-email-mst@redhat.com> References: <20151210114114.GE2570@work-vm> <56698E68.5040207@intel.com> <566D9320.8000209@intel.com> <567CEA53.5030601@intel.com> <20151225140336-mutt-send-email-mst@redhat.com> <56817476.8080607@intel.com> <20151229184426-mutt-send-email-mst@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Yang Zhang , "Tantilov, Emil S" , kvm@vger.kernel.org, aik@ozlabs.ru, qemu-devel@nongnu.org, lcapitulino@redhat.com, Blue Swirl , kraxel@redhat.com, "Rustad, Mark D" , quintela@redhat.com, "Skidmore, Donald C" , Alexander Graf , Or Gerlitz , "Dr. David Alan Gilbert" , Alex Williamson , Anthony Liguori , cornelia.huck@de.ibm.com, "Lan, Tianyu" , Ard Biesheuvel , "Dong, Eddie" , "Jani, Nrupal" , amit.shah@redhat.com, Paolo Bonzini To: Alexander Duyck Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On Tue, Dec 29, 2015 at 09:04:51AM -0800, Alexander Duyck wrote: > On Tue, Dec 29, 2015 at 8:46 AM, Michael S. Tsirkin wrote: > > On Tue, Dec 29, 2015 at 01:42:14AM +0800, Lan, Tianyu wrote: > >> > >> > >> On 12/25/2015 8:11 PM, Michael S. Tsirkin wrote: > >> >As long as you keep up this vague talk about performance during > >> >migration, without even bothering with any measurements, this patchset > >> >will keep going nowhere. > >> > > >> > >> I measured network service downtime for "keep device alive"(RFC patch V1 > >> presented) and "put down and up network interface"(RFC patch V2 presented) > >> during migration with some optimizations. > >> > >> The former is around 140ms and the later is around 240ms. > >> > >> My patchset relies on the maibox irq which doesn't work in the suspend state > >> and so can't get downtime for suspend/resume cases. Will try to get the > >> result later. > > > > > > Interesting. So you sare saying merely ifdown/ifup is 100ms? > > This does not sound reasonable. > > Is there a chance you are e.g. getting IP from dhcp? > > > Actually it wouldn't surprise me if that is due to a reset logic in > the driver. For starters there is a 10 msec delay in the call > ixgbevf_reset_hw_vf which I believe is present to allow the PF time to > clear registers after the VF has requested a reset. There is also a > 10 to 20 msec sleep in ixgbevf_down which occurs after the Rx queues > were disabled. That is in addition to the fact that the function that > disables the queues does so serially and polls each queue until the > hardware acknowledges that the queues are actually disabled. The > driver also does the serial enable with poll logic on re-enabling the > queues which likely doesn't help things. > > Really this driver is probably in need of a refactor to clean the > cruft out of the reset and initialization logic. I suspect we have > far more delays than we really need and that is the source of much of > the slow down. > > - Alex For ifdown, why is there any need to reset the device at all? Is it so buffers can be reclaimed? -- MST