From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=34304 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OxlGm-0006uf-7P for qemu-devel@nongnu.org; Mon, 20 Sep 2010 14:39:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OxlGk-0002s7-OS for qemu-devel@nongnu.org; Mon, 20 Sep 2010 14:39:16 -0400 Received: from mail-qy0-f180.google.com ([209.85.216.180]:56386) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OxlGk-0002s1-Km for qemu-devel@nongnu.org; Mon, 20 Sep 2010 14:39:14 -0400 Received: by qyk31 with SMTP id 31so4898446qyk.4 for ; Mon, 20 Sep 2010 11:39:14 -0700 (PDT) Message-ID: <4C97AA44.8000403@codemonkey.ws> Date: Mon, 20 Sep 2010 13:39:00 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <20100920163042.GA29466@redhat.com> <4C978EC9.20907@codemonkey.ws> <20100920164758.GB29862@redhat.com> <4C979258.9020701@codemonkey.ws> <20100920171439.GF29862@redhat.com> <4C97A474.8040900@codemonkey.ws> <20100920182459.GE30611@redhat.com> In-Reply-To: <20100920182459.GE30611@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH] net: delay peer host device delete List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org On 09/20/2010 01:24 PM, Michael S. Tsirkin wrote: > On Mon, Sep 20, 2010 at 01:14:12PM -0500, Anthony Liguori wrote: > >> On 09/20/2010 12:14 PM, Michael S. Tsirkin wrote: >> >>> On Mon, Sep 20, 2010 at 11:56:56AM -0500, Anthony Liguori wrote: >>> >>>> On 09/20/2010 11:47 AM, Michael S. Tsirkin wrote: >>>> >>>>> On Mon, Sep 20, 2010 at 11:41:45AM -0500, Anthony Liguori wrote: >>>>> >>>>>> On 09/20/2010 11:30 AM, Michael S. Tsirkin wrote: >>>>>> >>>>>>> With -netdev, virtio devices present offload >>>>>>> features to guest, depending on the backend used. >>>>>>> Thus, removing host ntedev peer while guest is >>>>>>> active leads to guest-visible inconsistency and/or crashes. >>>>>>> See e.g. https://bugzilla.redhat.com/show_bug.cgi?id=623735 >>>>>>> >>>>>>> As a solution, while guest (NIC) peer device exists, >>>>>>> we must prevent the host peer from being deleted. >>>>>>> >>>>>>> This patch does this by adding peer_deleted flag in nic state: >>>>>>> if host device is going away while guest device >>>>>>> is around, set this flag and keep host device around >>>>>>> for as long as guest device exists. >>>>>>> >>>>>> Having an unclear life cycle really worries me. >>>>>> >>>>>> Wouldn't the more correct solution be to avoid removing the netdev >>>>>> device until after the peer has successfully been removed? >>>>>> >>>>>> Regards, >>>>>> >>>>>> Anthony Liguori >>>>>> >>>>> This is exactly what the patch does. >>>>> >>>> At the management layer instead of doing it magically in the backend. >>>> >>> The amount of pain this inflicts on management would be considerable. >>> Hotplug commands were designed to be asynchronous >>> (starts the process, does not wait for it to complete), maybe that >>> was a mistake but we can not change semantics at will now. >>> >>> Add new commands, okay, but existing ones should work and get fixed >>> if there's a bug. >>> >> But having commands that are impossible to use correctly is not very good. >> > So we will have to fix the existing commands so they can be used > correctly. Since the device is removed from the list > shown to the monitor, I do not really see why the user > cares that the backend is actually still around > until the device is removed. > That's even more wrong and maybe I don't understand what you're saying. But the test case is easy. acpiphp is not loaded. You do a device_del of a device. What happens? You do a netdev_del immediately afterwards, what are you guaranteed as a management tool? If I do a info network, you're telling me I don't see the netdev device even though the device is still there and the guest is actively using it? That can't possibly be a good thing. >> 4) async device removal + remove backend >> >> Whereas remove backend may or may not cause removal depending on >> whether device removal has happened. So it's really async removal >> but it doesn't happen deterministically on it's own. What happens >> if you call remove backend before starting async device removal? >> > It won't be removed until device is removed. > Which is non-deterministic and guest controlled. >> What if the guest never removes the device? >> > Not really different from guest never reacting to nic hotplug. > If you want to fix this, we'll need a "force" flag to delete. > We need to make sure management tools are aware that pci hot unplug can fail. We should design our interfaces to encourage this awareness. Force is not necessarily needed. >> What if a reset >> happens? >> > I think reset will complete the hotplug. If it does not we need to fix > it anyway. > I'm fairly sure it doesn't FWIW. >> One advantage of (1) is that there is no tricky life cycle >> considerations. If we did (3), we would have to think through what >> happens if a guest doesn't respond to an unplug request. >> >> Regards, >> >> Anthony Liguori >> > All very well, but this ignores the issue: > > We have told management that a way to remove a frontend backend pair is > by giving two commands. That's the problem. This is fundamentally broken. > Management has implemented this. Now we need to > have qemu do the right thing. > The only way to do this correctly is to make device_del block until the operation completes. Unfortunately, this becomes a libvirt DoS which would cause all sorts of problems. I don't see a lot of options that allow the management tools to continue doing what they're doing. This cannot work properly unless there is a management interface that is explicitly aware that 1) pci hot unplug can and will not be successful 2) the device is still there until it's successful (which may be forever). Regards, Anthony Liguori